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Abstract 



Parameter ensembles or sets of random effects constitute one of the cornerstones of modern statis- 
tical practice. This is especially the case in Bayesian hierarchical models, where several decision 
theoretic frameworks can be deployed to optimise the estimation of parameter ensembles. The 
reporting of such ensembles in the form of sets of point estimates is an important concern in epi- 
demiology, and most particularly in spatial epidemiology, where each clement in these ensembles 
represent an epidemiological unit such as a hospital or a geographical area of interest. The esti- 
mation of these parameter ensembles may substantially vary depending on which inferential goals 
are prioritised by the modeller. Since one may wish to satisfy a range of desiderata, it is therefore 
of interest to investigate whether some sets of point estimates can simultaneously meet several 
inferential objectives. In this thesis, we will be especially concerned with identifying ensembles of 
point estimates that produce good approximations of (i) the true empirical quantilcs and empirical 
quartile ratio (QR) and (ii) provide an accurate classification of the ensemble's elements above 
and below a given threshold. For this purpose, we review various decision-theoretic frameworks, 
which have been proposed in the literature in relation to the optimisation of different aspects of 
the empirical distribution of a parameter ensemble. This includes the constrained Bayes (CB), 
weighted-rank squared error loss (WRSEL), and triple-goal (GR) ensembles of point estimates. In 
addition, we also consider the set of maximum likelihood estimates (MLEs) and the ensemble of 
posterior means -the latter being optimal under the summed squared error loss (SSEL). Firstly, 
we test the performance of these different sets of point estimates as plug-in estimators for the 
empirical quantilcs and empirical QR under a range of synthetic scenarios encompassing both 
spatial and non-spatial simulated data sets. Performance evaluation is here conducted using the 
posterior regret, which corresponds to the difference in posterior losses between the chosen plug-in 
estimator and the optimal choice under the loss function of interest. The triple-goal plug-in es- 
timator is found to outperform its counterparts and produce close-to-optimal empirical quantiles 
and empirical QR. A real data set documenting schizophrenia prevalence in an urban area is also 
used to illustrate the implementation of these methods. Secondly, two threshold classification 
losses (TCLs) -weighted and unweighted- are formulated. The weighted TCL can be used to 
emphasise the estimation of false positives over false negatives or the converse. These weighted 
and unweighted TCLs arc optimised by a set of posterior quantiles and a set of posterior medians, 
respectively. Under an unweighted classification framework, the SSEL point estimates are found 
to be quasi-optimal for all scenarios studied. In addition, the five candidate plug-in estimators are 
also evaluated under the rank classification loss (RCL), which has been previously proposed in the 
literature. The SSEL set of point estimates are again found to constitute quasi-optimal plug-in 
estimators under this loss function, approximately on a par with the CB and GR sets of point 
estimates. The threshold and rank classification loss functions are applied to surveillance data 
reporting methicillin resistant Staphylococcus aureus (MRSA) prevalence in UK hospitals. This 
application demonstrates that all the studied plug-in classifiers under TCL tend to be more liberal 
than the optimal estimator. That is, all studied plug-in estimators tended to classify a greater 
number of hospitals above the risk threshold than the set of posterior medians. In a concluding 
chapter, we discuss some possible generalisations of the loss functions studied in this thesis, and 
consider how model specification can be tailored to better serve the inferential goals considered. 
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Chapter 1 

Introduction 



An important concern in epidemiology is the reporting of ensembles of point estimates. In disease 
mapping, for example, one may wish to ascertain the levels of risk for cancer in different regions 
of the British Isles (Jarup et al., 2002), or evaluate cancer mortality rates in different adminis- 
trative areas (Lopez- Abente et al., 2007). In public health, one may be interested in comparing 
different service providers such as neonatal clinics (MacNab et al., 2004). Estimation of parameter 
ensembles may also be of interest as performance indicators, such as when compiling league tables 
(Goldstein and Spiegelhalter, 1996). Thus, one of the fundamental tasks of the epidemiologist is 
the summarising of data in the form of an ensemble of summary statistics, which constitutes an 
approximation of a 'parameter ensemble'. 

Such a task, however, is complicated by the variety of goals that such an ensemble of point 
estimates has to fulfil. A primary goal, for instance, may be to produce element-specific point 
estimates, which optimally reflect the individual level of risk in each area. Alternatively, one may 
be required to select the ensemble of point estimates that best approximate the histogram of the 
true parameter ensemble (Louis, 1984). A related but distinct desideratum may be to choose the 
set of point estimates that gives a good approximation of the true heterogeneity in the ensemble. 
This is especially important from a public health perspective since unexplained dispersion in the 
ensemble of point estimates may indicate the effect of unmeasured covariates. Naturally, there does 
not exist a set of point estimates, which simultaneously optimise all of these criteria. However, 
reporting several ensembles of point estimates corresponding to different desiderata can yield 
to some inconsistencies, which generally lead epidemiologists to solely report a single set of point 
estimates when communicating their results to the public or to decision makers. There is therefore 
a need for considering how certain ensembles of point estimates can satisfy several epidemiological 
goals at once. 

A natural statistical framework within which these issues can be addressed is Bayesian decision 
theory. This approach relies on the formulation of a particular loss function and the fitting of a 
Bayesian model to the data of interest. The former formalises one's inferential goals, whereas 
the latter permits to derive the joint posterior distribution of the parameters of interest, which 
will then be used for the optimisation of the loss function. Once these two ingredients are speci- 
fied, standard arguments in decision theory imply that an optimal set of point estimates can be 
obtained by minimising the posterior expected loss function. In spatial epidemiology, the use of 
Bayesian methods, thanks to the now wide availability of computational resources, has increas- 
ingly become the procedure of choice for the estimation of small-area statistics (Lawson, 2009). 
This has paralleled an increase in the amount of multilevel data routinely collected in most fields 
of public health and in the social sciences. The expansion of Bayesian methods has especially been 
motivated by an increased utilisation of hierarchical models, which are characterised by the use of 
hierarchical priors (Best et al., 1996, Wakefield et al., 2000, Gelman et al., 2004). This family of 
models has the desirable property of borrowing strength across different areas, which permits to 
decrease the variability of each point estimate in the ensemble. 

Such Bayesian models are generally used in conjunction with summed of squared error loss 
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(SSEL) function, whose optimal estimator is the set of posterior means of the elements of the 
parameter ensemble of interest. The SSEL is widely used in most applications because it produces 
estimators, which are easy to handle computationally, and often readily available from MCMC 
summaries. Despite being the most popular loss function in both spatial and non-spatial epi- 
demiology, this particular choice of loss function, however, remains criticised by several authors 
(Lehmann and Casella, 1995, Robert, 1996). In particular, the use of the quadratic loss has been 
challenged by researchers demonstrating that the posterior means tend to overshrink the empirical 
distribution of the ensemble's elements (Louis, 1984). The empirical variance of the ensemble of 
point estimates under SSEL can indeed be shown to represent only a fraction of the true empirical 
variance of the unobserved parameters (Ghosh, 1992, Richardson et al., 2004). 

Due to these limitations and motivated by a need to produce parameter ensemble estimates 
that satisfy other epidemiological desiderata, several authors have suggested the use of alternative 
loss functions. Specifically, Louis (1984) and Ghosh (1992) have introduced a constrained loss 
function, which produce sets of point estimates that match both the empirical mean and empirical 
variance of the true parameter ensemble. Other authors have made use of the flexibility of the 
ensemble; distribution to optimise the estimation of certain parts of the empirical distribution 
to the detriment of the remaining ones. This has been done by specifying a set of weights (j), 
which emphasise the estimation of a subset of the elements of the target ensemble (Wright et al., 
2003, Craigmile et al., 2006). A particularly successful foray in simultaneously satisfying several 
inferential objectives was achieved by Shen and Louis (1998), who proposed the use of triple-goal 
ensembles of point estimates. These sets of point estimates constitute a good approximation of the 
empirical distribution of the parameter ensemble. Moreover, their ranks are close to the optimal 
ranks. Finally, they also provide good estimates of element-specific levels of risk. 

Two specific goals, however, do not appear to have hitherto been fully addressed in the lit- 
erature on Bayesian decision theory. These are (i) the estimation of the empirical quantiles and 
the empirical quartile ratio (QR) of a parameter ensemble of interest, and (ii) the optimisation 
of the classification of the elements of an ensemble above or below a given threshold. The first 
objective lies close to a need of evaluating the amoimt of dispersion of a parameter ensemble. The 
estimation of the QR indeed constitutes a good candidate for quantifying the variability of the 
elements in the ensemble, which can then be related to the presence or absence of unmeasured risk 
factor. While some epidemiologists have considered the problem of choosing a specific measure 
of dispersion for parameter ensembles (Larsen et al., 2000, Larsen and Merlo, 2005), these meth- 
ods have been formulated in a frequentist framework and little work has been conducted from a 
Bayesian perspective. 

The second goal, which we wish to explore in this thesis, is the classification of elements in a 
parameter ensemble. Increased interest in performance evaluation and the development of league 
tables in education and health has led to the routine gathering of surveillance data, which permit to 
trace the evolution of particular institutions over time. Despite the range of drawbacks associated 
with these methods (Goldstein and Spiegelhalter, 1996), a combination of factors has made the 
use of surveillance data particularly popular. Excess mortality rates following paediatric cardiac 
surgery in Bristol Royal Infirmary, for instance, has raised awareness about the significance of this 
type of data (Grigg et al., 2003). The Shipman inquiry, in addition, has stressed the need for a 
closer monitoring of mortality rates in general practices in the UK (Shipman-Inquiry, 2004). These 
factors have been compounded by the public and political attention to hospital-acquired infections 
such as methicillin resistant Staphylococcus aureus (MRSA) or Clostridium difficile (Grigg and 
Spiegelhalter, 2007, Grigg et al.. 2009). Such developments are reflected by the recent creation of 
a new journal entitled Advances in Disease Surveillance, published by the International Society for 
Diseases Surveillance in 2007. While some statistical work has focused on the monitoring of disease 
counts over time (Spiegelhalter, 2005, Grigg et al., 2009), few authors have considered the problem 
of classifying the elements of a parameter ensemble in low- and high-risk groups (Richardson et al., 
2004). Such classifications, however, may be particularly useful for the mapping of risk in spatial 
epidemiology, where diff'erent groups of areas could be assigned diff'erent colours, according to each 
group's level of risk. 

However, while the estimation of the empirical quantiles and empirical QR or the classification 
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of the elements of a parameter ensemble can be conducted optimally, these optimal estimators may 
not succeed to meet other inferential goals. One of the central themes of this thesis will therefore 
be to identify sets of point estimates that can simultaneously satisfy several inferential objectives. 
For this purpose, we will study the behaviour of various plug-in estimators under the specific 
loss functions of interest. Plug- in estimators are computed by applying a particular function to a 
candidate set of point estimates. In order to evaluate the performance of each of these candidate 
ensembles in comparison to the optimal choice of estimator under the loss functions of interest, 
we will compute the posterior regret associated with the use of that specific candidate plug-in 
estimator. We will compare different plug-in estimators using spatial and non-spatial simulations, 
as well as two real data sets documenting schizophrenia and MRSA prevalence. 

This thesis is organised as follows. In chapter 2, we describe the general principles of decision 
theory and present the specific modelling assumptions commonly made in epidemiology and spatial 
epidemiology. Several loss functions, which have been proposed for the estimation of parameter 
ensembles in hierarchical models are introduced with their respective optimal estimators. Estima- 
tors resulting from these loss functions will be used throughout the thesis as plug-in estimators 
under other loss functions of interest. Chapter 3 is specifically dedicated to the optimisation of 
the empirical quantiles and empirical QR of a parameter ensemble, which permit to evaluate the 
amount of dispersion in the ensemble distribution. In chapter 4, we consider the issue of clas- 
sifying the difFcirent cilcmients of a parameter ensemble above or below a given threshold of risk, 
which particularly pertains to the analysis of surveillance data. Finally, in chapter 5, we consider 
some possible extensions and generalisations of the techniques proposed in this thesis, with spe- 
cial emphasis on the specification of tailored Bayesian models, which may better serve the target 
inferential goals. 
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Chapter 2 



Loss functions for Parameter 
Ensembles 

Summary 

In this introductory chapter, wc briefly introduce the general decision-theoretic 
framework used in Bayesian statistics, with special attention to point estimation 
problems. We present three commonly used loss functions: the squarcd-error 
loss, the absolute error loss and the 0/1-loss. A working definition of Bayesian 
hierarchical models is then provided, including the description of three specific 
families of models characterised by different types of prior structure, which will 
be used throughout the thesis. The concept of a parameter ensemble in a hi- 
erarchical model is also defined with some emphasis on the optimal estimation 
of the functions of parameter ensembles. Issues related to hierarchical shrinkage 
are reviewed with a brief discussion of the Ghosh-Louis theorem. Several com- 
monly adopted decision-theoretic approaches to the estimation of a parameter 
ensemble are also introduced, including the constrained Bayes estimator (CB), 
the triple-goal estimator (GR) and the weighted-rank squared-error loss estima/- 
tor (WRSEL). Finally, we discuss plug- in estimators, which consist of functions 
of an ensemble of point estimates. Differences between such plug-in estimators 
and the optimal estimators of various functions of parameter ensembles will be 
the object of most of the thesis at hand. In particular, the performance of optimal 
and plug-in estimators will be studied within the context of two inferential ob- 
jectives relevant to epidemiological practice: (i) the estimation of the dispersion 
of parameter ensembles and (ii) the classification of the elements of a parameter 
ensemble above or below a given threshold. We conclude with a description of 
the posterior regret, which will be used throughout the thesis as a criterion for 
performance evaluation. 

2.1 Bayesian Decision Theory 

In this section, we briefly introduce the premises of decision theory with special attention to point 
estimation. We also consider the differences between the frequentist and the Bayesian approach 
to the problem of point estimation, and describe three classical loss functions. A discussion of the 
specific issues arising from the estimation of a function of the model's parameters is also given, as 
this is especially relevant to the thesis at hand. Note that, throughout this chapter and the rest of 
this thesis, we will not emphasise the niceties of measure theory, but restrict ourselves to the level 
of formality and the notation introduced by Berger (1980) and Robert (2007). Unless otherwise 
specified, all random variables in this thesis will be assumed to be real-valued. 
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2.1.1 Premises of Decision Theory 

Decision theory formalises the statistical approach to decision making. Here, decision making 
should be understood in its broadest sense. The estimation of a particular parameter, for instance, 
constitutes a decision as we opt for a specific value among a set of possible candidates. The 
cornerstone of decision theory is the specification of a utility function. The main strength and 
appeal of decision theory is that once a utility function has been selected, and a set of alternative 
options has been delineated, the decision problem is then fully specified and the optimal decision 
can be derived. 

A rigorous approach to decision theory is based on the definition of three spaces. Firstly, let 
6 denotes the space of the true states of the world. Secondly, the decision space, denoted 2?, 
will comprise the decisions -sometimes referred to as acts, actions or choices- that one can take. 
A third space encompasses the consequences of a particular course of action. These are often 
expressed in monetary terms, and for that reasons are generally termed rewards. This last space 
will be denoted Z. The true states of the world, the decision space and the space of consequences 
are linked together by a loss function defined as follows, 

L:exDh-s.Z, (2.1) 

where x denotes the Cartesian product. A loss function is thus a criterion for evaluating a possible 
procedure or action 5 G 2?, given some true state of the world G ©. This loss function takes values 
in the space of consequences Z. A decision problem is therefore fully specified when the above 
four ingredients are known: (Q^T), Z, L). Note that, in axiomatic treatments of decision theory, 
the aforementioned quadruple is generally replaced by (O, V, Z, where ^ is a total ordering on 
the space of consequences. Providing several properties of ^ hold (e.g. transitivity, asymmetry), 
the existence of a loss function L can be demonstrated. Decision theory originated in the context 
of game theory (von Neumann and Morgenstern, 1944). and was adapted to statistics by Savage 
(1954). The formalisation of a decision problem as the quadruple {Q,T), Z, :<) is probably the 
most accepted definition (Fishburn, 1964, Krcps, 1988, DcGroot, 1970, Robert, 2007), although 
some other authors also include the set of all possible experiments, £, in the definition of a decision 
problem (Raiffa and Schlaifer, 1960). 

In this thesis, we will be especially interested in parameter estimation. Since the space of the 
true states of the world is the set of values that the parameter of interest can take, we will simply 
refer to Q as the parameter space. In estimation problems, the space of decisions, denoted V, 
is generally taken to be identical to the parameter space, O. In addition, both spaces are also 
usually defined as subsets of the Real line. That is, we have 

D = e C R. (2.2) 

For convenience, we will assume in this section that 6 is univariate and study the multivariate 

case in section . Moreover, the space of consequences is defined as the positive half of the Real 
line [0, +cxd). Thus, when considering point estimation problems, such as the ones of interest in 
the present thesis, a decision problem will be defined as (0, 9, [0, +oo), L), with L satisfying 

L : 9 X 9 [0,+oo). (2.3) 

Albeit our discussion has centered on the specification of a particular loss function, historically, 
the definition of a total ordering on Z has been associated with the choice of an arbitrary utility 
function (see von Neumann and Morgenstern, 1944). However, the use of an unbounded utility 
leads to several problems, as exemplified by the Saint Petersburg's paradox. (This paradox involves 
a gamble, which results in an infinite average gain, thereby leading to the conclusion that an 
arbitrarily high entrance fee should be paid to participate in the game (sec Robert, 2007, for a full 
description).) As a result, the codomain of the utility function is usually taken to be bounded, 
with U :9x9i-)-(— 00,0]. This gives the following relationship with our preceding definition of 
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the loss function in estimation problems, 

U{e,5) = -L{e,5). (2.4) 

In the context of decision theory, point estimation problems can be naturally conceived as 
games, where the player attempts to minimise her losses (see Berger, 1980). Such losses, however, 
are associated with some level of uncertainty. That is, there exists a space V of probability 
distributions on Z. The total ordering on the space of consequences Z can then be transposed 
onto the probability space V , using the expectation operator. An axiomatic construction of a total 
ordering ^ on Z leading to the proof of the existence of a utility or loss function can be found in 
several texts (see Savage, 1954, Fisliburn, 1964, Kreps, 1988) or see Robert (2007) for a modern 
treatment. The purpose of the game is therefore to minimise one's loss, by selecting the optimal 
decision, 5* , which is defined as follows, 

5* := argminE[L(6',5)], (2.5) 

where := indicates that the LHS is defined as the RHS. Here, the two schools of thought in statistics 
differ in their handling of the expectation operator. Prequentist and Bayesian statisticians have 
traditionally chosen different types of expectations, as we describe in the next section. 



2.1.2 Frequentist and Bayesian Decision Theories 

From a statistical perspective, the specification of the space of the states of the world O and the 
decision space T> are complemented by the definition of a sample of observations, which we will 
denote by y := {yi, ...,?/„}, where y G 3^ C M". We assume that this sample was generated from 
a population distribution p{'y\9), with G O. Our decision 5 is therefore computed on the basis 
of this sample, and we will denote this dependency by 5{y). Following Robert (2007), we will use 
the term estimator to refer to the function 5{-). That is, an estimator is the following mapping, 

S-.y^V, (2.6) 

where T) is taken to be equivalent to the space of states of the world, O. An estimate, by contrast, 
will be a single vahie in 7? = O, given some observation y. 

From a frequentist perspective, the experiment that has generated the finite sample of ob- 
servations y is assumed to be infinitely repeatable (Robert, 2007). Under this assumption of 
repeatability, the optimal decision rule 5 can be defined as the rule that minimises the expectation 
of the loss function with respect to the (unknown) population distribution p{y\9). This gives 

RF[e,5{y)] := j L{e,5{y))p{y\6)dy. (2.7) 
y 

The quantity is called the frequentist risk, whereas p{y\9) can be regarded as the likelihood 
function of traditional statistics, evaluated at the true state of the world, 9 

In the context of Bayesian theory, by contrast, we specify a prior distribution on the space 
of the states of the world, Q. This distribution, denoted p{9), reflects our imccrtainty about the 
true state of the world. Given these assumptions and the specification of a particular prior, it is 
then possible to integrate over the parameter space. This gives the following Bayesian version of 
equation (2.7), 

RB\p{6)Ay)] ■■=11 L{9,5{y))p{y\e)p{6)dyde, (2.8) 

ey 

which is usually termed the Bayes risk. Note that Rb takes the prior distribution p{9) as an 
argument, since its value entirely depends on the choice of prior distribution on 6. The optimal 
decision 5 that minimises Rb in equation (2.8) can be shown (using Fubini's theorem and Bayes' 
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rule, see Robert (2007) for a proof) to be equivalent to the argument that minimises the posterior 
expected loss, which is defined as follows, 



p \p{e\y), S{y)] := I m 5{y))pi9\y)d9, (2.9) 
e 

where p{0\y) is the posterior distribution of 6, obtained via Bayes' rule, 

^(^1^) - Upiy\o)pie)dO- ^'-''^ 

Since minimisation of p for all y is equivalent to the minimisation oi Rb, we will generally use the 

terms Bayes risk and posterior (expected) loss interchangeably. The optimal decision under both 
the Bayes risk and the posterior loss is termed the Bayes choice, action or decision. 

For notational convenience, we will use the following shorthand for the posterior loss, 

p[e,5{y)] ■.= p[p{e\y),5{y)], (2.11) 

where the dependence of p on the posterior distribution p{9\y), and the dependence of the decision 
5{y) on the data are made implicit. In addition, since in this thesis, we will adopt a Bayesian 
approach to statistical inference, all expectation operator will be defined with respect to the 
appropriate posterior distribution of the parameter of interest, generally denoted 9, except when 
otherwise specified. 

The defining condition for the Bayes choice to exist is that the Bayes risk is finite. In the 
preceding discussion, we have assumed that the prior distribution on 9 is proper. When specifying 
improper priors, the Bayes risk will, in general, not be well defined. In some cases, however -when 
an improper prior yields a proper posterior - the posterior expected loss will be well defined. The 
decision that minimises p in that context is then referred to as the generalised Bayes estimator. 
Several commonly used models in spatial epidemiology make use of improper prior distributions, 
which nonetheless yield proper posteriors. In such cases, we will be using the generalised Bayes 
estimator in the place of the Bayes choice. 

We thus readily see why decision theory and Bayesian statistics form such a successful combi- 
nation. The use of a prior distribution in Bayesian models completes the assumptions about the 
space of the states of the world in decision theory. By contrast, the frequentist perspective runs 
into difficulties by conditioning on the true parameter ^. As a result, the decisions that minimise 
Rp are conditional on 9, which is unknown. In a Bayesian setting, by contrast, the expected loss 
is evaluated conditional on y, which does not constitute a problem since such observations are 
known (see also Robert, 2007, for a discussion). 

Decision theory puts no constraints on the nature of the loss function that one may utilise. 
It should be clear from the foregoing discussion that the specification of the loss function will 
completely determine which decision in T) is considered to be optimal, everything else being 
equal. In practice, the choice of a particular loss function is dependent on the needs of the 
decision maker, and which aspects of the decision problem are of interest. However, to facilitate 
comparisons between different estimation problems, there exists a set of loss functions which are 
widely accepted and used throughout the statistical sciences. These classical loss functions are 
described in the following section. 



2.1.3 Classical Loss Functions 

There exist three classical loss functions, which constitute the building blocks of many other loss 
functions, and arc therefore of key importance to our development. They are the following; (i) 
the squared error loss, (ii) the absolute value error loss and (iii) the 0/1-loss. We review them 
in turn, alongside their corresponding minimisers. These three loss functions are especially useful 
for estimation problems, and will therefore be described with respect to some known sample of 
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observations, y. 



Squared Error Loss 

The quadratic loss or squared error loss (SEL) is the most commonly used loss function in the 
statistical literature. It is defined as the Euclidean distance between the true state of the world 
and the candidate estimator i5(y). Thus, we have 

SEL(0,%)):=(0-%))2. (2.12) 

In a Bayesian context, this loss function can be minimised by integrating SEL(0, 5{y)) with respect 
to the posterior distribution of 9, and minimising the posterior expected loss with respect to (5(y). 
Since the SEL is strictly convex, there exists a unique minimiser to SEL, which is the posterior 
mean: E[(^|y]. We will usually denote E[(^|y] by (9^^^", when comparing that particular estimator 
with other estimators based on different loss functions. 



Absolute Value Error Loss 

Another classical loss function is the absolute error loss (AVL), which makes use of the modulus 
to quantify one's losses. It is defined as follows, 

AVL(0,<5(y)):=|0-,5(y)|, (2.13) 

and its minimiser in the context of Bayesian statistics, is the posterior median, which will be 
denoted 0^^. It can be shown that the posterior median is not the unique minimiser of the 
posterior expected AVL (see Berger, 1980). In addition, since the posterior median will be found 
to be the optimal Bayes choice under other loss functions, it will be useful to denote this quantity 
without any reference to a specific loss function. We will therefore use Q^^^ to denote the 
posterior median in this context. This quantity will be discussed in greater details in chapter 4, 
when considering the classification of the elements of a parameter ensemble. 

0/1 -Loss 

For completion, we also introduce the 0/1-loss function. It is defined as follows, 

The optimal minimiser of the 0/1-loss under the posterior distribution of 6 is the posterior mode. 
In the sequel, we will drop any reference to the data y, when referring to the estimator 5{y), and 
simply use the notation 5. We now turn to a more specific aspect of loss function optimisation, 
which will be especially relevant to the research questions addressed in this thesis. 



2.1.4 Functions of Parameters 

It will be of interest to consider the estimation of functions of parameters, say h{6)^ where h 
is some mapping from 9 to itself, or possibly to a subset of the true states of the world. For 
convenience, we will assume in this section that 9 is univariate, and study the multivariate case in 
section 2.2.3. In such cases, the Bayesian decision problem can be formulated as follows. Choose 
the i5 , which is defined as, 

5' := argminp (/i(6'), 6) , (2.15) 
s 

where the minimisation is conducted over the decision space, here defined as V C Q. When the 
loss function of interest is the SEL, we then have the following equality, for any arbitrary function 
h{-) of the parameter 9, 

E[h{9)\y] = argminE \{h{9) - Sf y] . (2.16) 
5 L J 
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Figure 2.1. Directed Acyclic Graph (DAG) for a general hierarchical model, with y = {yi, ■ ■ ■ ,yn} 
denoting n observations and 6 = {9i, . . . ,6n} denoting the parameter ensemble of interest. The prior 

distribution of each 6i is controlled by a vector of liypcrparametcr ^, which is given a hyperprior. Following 
standard convention, arrows here indicate direct dependencies between random variables. 

That is, the optimal estimator 6 G I? of h{9) is the posterior expectation of h{9). Here, the 
decision problem is fully specified by defining both the parameter space O and the decision space 
T), as the codomain of h{-). 

If, in addition, h{-) is a linear function, it then follows from the linearity of the expectation 
that 

E[hi9)\y]=h{E[e\y]). (2.17) 

However, for non-linear h(-), this relationship does not hold. Let p denotes an arbitrary Bayesian 
expected loss for some loss function L, we then have the following non-equality, 

S" := aigmin p{h{e),S) ^ h ^argmin p (6', ^)^ =: /i(6l*). (2.18) 

Much of this thesis will be concerned with the differences between 5 and h{9 ), for particular 
p's. An estimator of h{9) based on an estimator of 6 will be referred to as a plug-in estimator. 
Thus, in equation (2.18), h{9 ) is the plug-in estimator of h{9). Our use of the term plug-in, 
here, should be distinguished from its utilisation in reference to the plug-in principle introduced 
by Efron and Tibshirani (1993) in the context of the bootstrap. For Efron and Tibshirani (1993), 
plug-in estimators have desirable asymptotic properties in the sense that such estimators cannot 
be asymptotically improved upon. In the present thesis, however, estimators such as h{9 ) are not 
deemed to be necessarily optimal, except in the sense that 9 is the Bayes action for some p (9, S). 

Specifically, it will be of interest to evaluate whether commonly encountered optimal Bayes 
actions, such as the posterior means and medians can be used to construct quasi-optimal plug-in 
estimators. This problem especially arises when the true parameter of interest is an ensemble of 
parameters, as described in the next section. 

2.2 Parameter Ensembles and Hierarchical Shrinkage 

The notion of a parameter ensemble will be the object of most of our discussion in the ensuing 
two chapters. In this section, we define this concept for general Bayesian models as well as for 
specific hierarchical models that will be considered throughout the thesis. In addition, we present 
the Ghosh-Louis theorem, which originally motivated research on the specific inferential problems 
associated with parameter ensembles. 
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2.2.1 Bayesian Hierarchical Models 

Bayesian hierarchical models (BHMs) can be conceived as Bayesian extensions of traditional mixed 
models, sometimes called multilevel models, where both fixed and random effects are included in a 
generalized linear model (see Demidenko, 2004). In its most basic formulation, a BHM is composed 
of the following two layers of random variables, 

y,'^ p{y^\0,,a,), g{9) ^ p{e\^), (2.19) 

for i = 1, . . . , n and where g{-) is a transformation of 6, which may be defined as a link function as 
commonly used in generalised linear models (see McCullagh and Nelder, 1989). The probability 
density function at the first level of the BHM, p{yi\9i,(Ti), is the likelihood function. The joint 
distribution on the g{6), here denoted p{d\$), is a multivariate prior, which is directly specified as 
a distribution on the transformed 9. Different choices of transformations, g{-), will hence induce 
different joint prior distributions. The specification of a BHM is complete when one fixes values for 
the vector of hyperparameters, ^, or specifies a hyperprior distribution for that quantity. When the 
^j's are independent and identically distributed (iid) given ^, we obtain the hierarchical structure 
described in figure 2.1 on page 22, which is represented using a directed acyclic graph (DAG). By 
extension, we will refer to the prior on such 0j's as an iid prior. 

In the sequel, we will assume that the n vectors of nuisance parameters CTj are known, albeit in 
practice, this may not be the case. Typically, such cr^'s will include the sampling variance of each 
j/j. In its simplest form, the model in equation (2.19) will be assumed to be composed of conjugate 
and proper probability density functions and each 6i will be an iid draw from a hierarchical 
prior. These modelling assumptions, however, will be sometimes relaxed in the present thesis. In 
particular, we will be interested in the following three variants. 

i. Conjugate proper iid priors on the Oi's, where the link function g{-) is the identity function. 

This simple case will encompass both the Normal-Normal model -sometimes referred to as 
compound Gaussian model- and the compound Gamma or Gamma-Inverse Gamma model. 
For the former, we have the following hierarchical structure, 

Vi N{eu af), Oi 7V(m, t% (2.20) 

with i = 1, . . . , n, and where the erf's are known variance components. Secondly, for the 
Gamma-Inverse Gamma model, we specify the following likelihood and prior, 

yi '~ Gam(ai, ^j), ~ Inv-Gam(Q:, ^), (2-21) 

where the Gamma and Inverse-Gamma distributions will be specifically described in section 
3.3.2. 

ii. Non-conjugate proper iid priors on the ^j's. In standard epidemiological settings, the speci- 

fication of a Poisson likelihood for the observations naturally leads to the modelling of the 
logarithm of the relative risks at the second level of the hierarchy. Such a model is sometimes 
referred to as a log-linear model. In such cases, however, the conjugacy of the likelihood 
with the prior on the 6'i's does not hold anymore. As an example of such non-conjugacy, we 
will study the following model, 

Vi Pois{eiEi), log N{a, r^), (2.22) 

for i = 1, . . . , n, where the link function g(-) := log(-). Here, the conjugacy of the prior with 
the Poisson likelihood does not hold, and specific sampling schemes need to be adopted in 
order to evaluate the posterior distributions of the 9i^s (see Robert and CascUa, 2004). 

iii. Non-conjugate improper non-iid priors on the 9i's. In this final situation, all assumptions on 

the prior distributions of the ^,'s will be relaxed. This type of model will be illustrated 
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in the context of spatial epidemiology, where Poisson-based generalized linear models are 
commonly used to model counts of disease cases in each of a set of geographical areas, and 
the joint prior distribution on the ^,'s models the spatial dependence between the regions of 
interest. A popular choice of prior distribution reflecting inter-regional spatial dependence 
is the intrinsic Gaussian conditional autoregressive (CAR) prior -an instance of a Markov 
random field. The intrinsic CAR does not constitute a proper distribution (Besag, 1974). 
However, Besag et al. (1991) have shown that the resulting marginal posterior distributions 
of the ^j's are proper (see also Besag and Kooperberg, 1995). It then follows that the 
posterior expected loss for the parameter ensemble is finite, and that we can therefore derive 
the generalised Bayes decision, which will be optimal for that decision problem. We describe 
this particular family of spatial models in more detail in the following section. 

In this thesis, when studying the properties of such epidemiological models, we will employ the 
term prevalence to refer to the rate of a particular condition in the population at risk. That is, 
the term prevalence here refers to the number of affected cases divided by the total number of 
individuals at risk for that condition. 

2.2.2 Spatial Models 

Here, we briefly present some conventional modelling assumptions made in spatial epidemiology, 
which will be used throughout the thesis (Wakefield et al., 2000). The starting point for modelling 
a non-infectious disease with known at-risk populations is the Binomial model, 

VijlPij '^Bm{Nij,pij), i = l,...,n; j = l,...,J, (2.23) 

where Py and N^j represent, respectively, the risk of disease and the population size in the i^^ area 
for the j"^ age strata. In this thesis, we are especially interested in modelling the prevalence of 
rare non-infectious diseases, such as specific types of cancers. For rare conditions, we approximate 
this model with a Poisson model 

VijlPij ~ Pois{NijPy). (2.24) 

Furthermore, we generally make a proportionality assumption, which states that Pij = 9i x pj, 
where 9i is the relative risk (RR) for the i^^ area, and pj is the reference rate for age strata j, 
which is assumed to be known. Each 6i is here the ratio of the age-standardised rate of the disease 
in area i compared to the age-standardised reference rate. Using the proportionality assumption, 
we may then sum over the risk in each strata in order to obtain 

Vi Pois{eiEi), (2.25) 

where yi = Vij ^^'^ = ^ijPj the observed and expected counts, respectively. 

Equation (2.25) is the likelihood of the model. Maximum likelihood, in this context, produces the 
following point estimate ensemble, 

^'MLE ._ (2.26) 

for every i = 1, . . . ,n, which are generally referred to as the standardised mortality or morbidity 
ratios (SMRs). The ^f^^'s, however, tend to be over-dispersed in comparison to the true RRs. 

In order to provide such a model with more flexibility, different types of hierarchical priors are 
commonly specified on the Oi^s (see Wakefield et al., 2000, for a review). Two spatial BHMs will be 
implemented in this thesis. A popular hierarchical prior in spatial epidemiology is the convolution 
prior (Besag et al., 1991), which is formulated as follows, 

log6i=Vi + Ui, (2.27) 

for every region i = l,...,n. Note, however, that this model will be implemented within the 
WinBUGS software, which uses a different representation of the spatial random effects, based on 
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the joint specification of an intercept with an improper flat prior and a sum-to-zero constraint on 
the Ui (see Appendix C, and section 3.4.3). 

Here, v is a vector of unstructured random effects with the foUowing speciflcation, 

Vi'^N{0,T^), (2.28) 

and the vector u captures the spatial auto-corrclation between neighbouring areas. Priors on each 
element of u are specified conditionally, such that 

«>j,Vi^i~7vf^^^,^V (2.29) 

with di is defined as the s(^t of indices of the neighbours of the i^^ area. Formally, di := {j ^ i : 
j = 1, . . . , n}, where i ^ j implies that regions i and j arc neighbours, and by convention, i ^ di- 
Moreover, in equation (2.29), we have also used rrii :— \di\ -that is, rrii is the total number of 
neighbours of the area. Therefore, each of the u,;'s is normally distributed aroimd the mean 
level of risk of the neighbouring areas, and its variability is inversely proportional to its number 
of neighbours. 

A different version of this model can be formulated using the Laplace distribution (Besag 
et al., 1991). As this density function has heavier tails, this specification is expected to produce 
less smoothing of abrupt changes in risk between adjacent areas. For the CAR Laplace prior, 
we therefore have Ui\uj,yj ^ i ~ L{ui\^j^Q,Uj/mi,T^/mi), for every i = l,...,n, using the 
following definition of the Laplace distribution, 

L(a;|mo, sq) ■■= exp < — |mo - a;| I . (2.30) 
zso I So ) 

We will refer to this model as the CAR Laplace or LI model. The rest of the specification of 
this BHM is identical to the one chosen for the CAR Normal. In these two models, following 
common practice in spatial epidemiology, two regions are considered to be neighbours if they 
share a common boundary (Clayton and Kaldor, 1987, Besag et al., 1991, Waller et al., 1997). 



2.2.3 Estimation of Pcirameter Ensembles 

The set of parameters, ^^^'s, in the hierarchical model described in equation (2.19) is generally 
referred to as a vector of random effects (Demidenko, 2004). In this thesis, we will refer to such a 
vector as an ensemble of parameters. That is, in a BHM following the general structure presented 
in equation (2.19), the vector of parameters, 

0:={^i,...,^„}, (2.31) 

will be referred to as a param,eter ensemble. Several properties of a parameter ensemble may 
be of interest. One may, for instance, wish to optimise the estimation of each of the individual 
elements in the ensemble. A natural choice in this context is the sum of quadratic losses for each 
parameter. This particular loss function is the summed squared error loss (SSEL) function that 
takes the following form, 

n 

SSEL (6>, 6>^"*) = ^{0i- 6if *) ^ , (2.32) 

In this context, using the notation that we have adopted in section 2.1.1, the decision problem for 
the estimation of a parameter ensemble using the SSEL function results in a parameter space 
and a decision space 2? which arc both assumed to be subsets of R" . The posterior expected loss 
associated with the loss function in equation 2.32 can be minimised in a straightforward manner. 
Its optimum is attained when selecting the vector of posterior means as Bayes choice, which will 
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be denoted by 

^SSEL i^ssEL^ ^ ^ssEL| ^ {E[e,\y], . . . ,EK|y]}. (2.33) 

This type of decision problems, characterised by the estimation of a set of parameters are sometimes 
referred to as compound estimation problems or compound loss functions (Ghosh, 1992). We will 
use of the term estimator to refer to an entire set of point estimates, such as 0^^^^ with respect to 
a particular loss function, here SSEL. This indeed follows from the fact that such a multivariate 
estimator is optimal under that posterior loss. Such an estimator, however, also constitutes an 
ensemble of single point estimates. 

Another property of a parameter ensemble, which may be of interest is the empirical dis- 
tribution function (EDF) of that ensemble, which will generally be referred to as the ensemble 
distribution. The EDF of 6 is defined as follows, 

1 " 

Fn{t):=-y2l{0i<t}, (2.34) 
1=1 

for every t e R, and where X is the usual indicator function. The term EDF is used here in 
analogy with its usual application in the context of iid observations. Note, however, that neither 
the elements of a true parameter ensemble nor the realisations of random effects can (in general) be 
assumed to be realisations of iid variables. A range of different loss functions may be considered in 
order to optimise the estimation of the EDF in equation (2.34). Previous authors have formalised 
this decision problem by using the integrated squared error loss function (ISEL), which takes the 
following form (see Shen and Louis, 1998), 

ISEL(F„, F-*) := J (F„(t) - F^-\t)fdt. (2.35) 

The posterior expectation of the ISEL can be easily minimised using Fubini's theorem to invert 
the ordering of the two integrals (i.e. the one with respect to t, and the one with respect to 6, 
see Shen and Louis (1998), for a formal proof). It then follows that the optimal estimator of 
E[ISEL(F„,i^^^')|y] is the posterior EDF, 

1 " 

F„(t) := E[F„(i)|y] = -^.m < t\y]. (2.36) 

When there may be ambiguity as to which parameter ensemble the posterior EDF is based on, we 
will emphasise the dependence on the vector 9, by using Fg{t). 

Finally, as discussed in section 2.1.4, one is sometimes interested in specific functions of a 
parameter. For the case of parameter ensembles, summary functions are often used to quantify 
particular properties of the ensemble distribution, such as the variance of the ensemble, for in- 
stance. These functions are generally real-valued, and the decision problem, in this context, can 
therefore be formalised as the standard quadruple: (R, R, [0,+oo),L), for some loss function L. 
When using the quadratic loss, we may have 

SEL(/i(6>), S) = {h{e) - Sf, (2.37) 

for a function of interest, /i : O, with C R" and C R. One may, for instance, wish to 
estimate the empirical variance of the parameter ensemble. This gives the following /i(-), 

1 " 

h{e):=-y2i(^i-9f, (2.38) 

1=1 

where 9 := Yl^=i^i- Naturally, in this particular case, the optimal Baycs estimator would 
be E[/i(0)|y]. However, as argued in section 2.1.4, since h{-) is a non-linear function of 6, the 
posterior empirical variance is different from the empirical variance of the posterior means. It 
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turns out that this particular observation has led to a key result in the study of BHMs, as we 
describe in the next section. 

2.2.4 HierEirchical Shrinkage 

Although the vector of posterior means is optimal under SSEL, their empirical variance is biased 

in comparison to the empirical variance of the true parameters of interest. The theorem that 
shows this systematic bias was first proved by Louis (1984) for the Gaussian compound case and 
further generalised by Ghosh (1992), who relaxed the distributional assumptions, but retained 
the modelling assumptions. In particular, whereas Louis (1984) proved this result for the case of 
conjugate models composed of Normal distributions, Ghosh (1992) showed that this relationship 
also holds for non-conjugate models based on arbitrary probability densities. 

Theorem 1 (Ghosh- Louis Theorem). Let a parameter ensemble 6 controlling the distribution 
of a vector of observations y, in a general two-stage hierarchical model as described in equation 
(2.19). Ifn>2, then 

2 

(2.39) 

where 6 := ^i/ with equality holding if and only if all {{6i — 6),...,{6n — 0)} have 

degenerate posteriors. 

A proof of this result is providcid in Ghosh (1992), and a weighted version is presented in 
Prey and Cressie (2003). The Ghosh-Louis theorem bears a lot of similarity with the law of total 
variance, which posits that Var(X) = E[Var(X|F)] + Var (E [X|F]). One may, however, note 
that there exists one substantial difference between this standard law of probability and the result 
at hand. The Ghosh-Louis theorem differs from the law of total probability in the sense that the 
former is conditioning on the data on both sides of the equation. 

The Ghosh-Louis theorem states a general property of BHMs. Hierarchical shrinkage is the 
under-dispersion of the empirical distribution of the posterior means in comparison to the posterior 
mean of the empirical variance of the true parameter ensemble. This should be contrasted to the 
commonly encountered issue of shrinkage in Bayesian mode;!, where a single posterior mean is 
shrank towards its prior mean. Although hierarchical shrinkage is here presented as a problem, it 
is often seen as a desirable property of BHMs; most especially when little information is available 
for each data point. This theorem has been used both to justify such a modelling decision and 
to highlight the limitations of this choice. In spatial epidemiological settings, Gelman and Price 
(1999) have shown that such shrinkage especially affects areas with low expected counts. Before 
reviewing the different decision-thcioretic solutions that have been proposed to produce better 
estimates of the empirical properties of parameter ensembles, we briefly introduce several statistical 
concepts, which will be useful in the sequel. 

2.3 Ranks, Quantile Functions and Order Statistics 
2.3.1 Ranks and Percentile Ranks 

We here adopt the nomenclature introduced by Laird and Louis (1989) on order statistics and rank 
percentiles as a guidance for our choice of notation. Of particular importance to our development 
is the definition of a rank. The rank of an element in a parameter ensemble is defined as follows, 

n 

Ri{d) := rank(6'i|6') = ^X{6'i > , (2.40) 
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where the smallest 9i in is given rank 1 and the largest 9i is given rank n. In equation (2.40), 
we have made explicit the fact that the rank of each 9^ depends on the entire parameter ensemble, 
9. In particular, this notation emphasises the fact that the function Ri{-) is non-linear in its 
argument. 

In the sequel, we will also make extensive use of percentile ranks (PRs). These are formally 
defined as follows, 

:= (2.41) 

n + 1 

Quite confusingly, percentile ranks are sometimes referred to as "percentiles", which should not 
be confused with the notion of percentiles discussed in section 2.3.2 in reference to quantiles. In 
general terms, the percentile rank of an element in a parameter ensemble is the percentage of 
elements in the corresponding EDF, which are lower or equal to the value of that rank. Rank per- 
centiles are especially useful in statistics when communicating ranking statistics to practitioners. 
Percentile ranks are empirical quantities, in the sense that they depend on the size of the param- 
eter ensemble, n. However, it can be shown that percentile ranks rapidly converge to asymptotic 
quantities, which are independent of n (Lockwood et al., 2004). 



2.3.2 Quantile Function 

The quantile function, denoted Q(p), of a continuous random variable X is formally defined as 
the inverse of the cumulative distribution function (CDF), F{x). Formally, since the function 
F : R I-)- [0, 1] is continuous and strictly monotonic, we can define 

Q{p)~F-\p), (2.42) 

for some real number p e [0, 1]. In general, the inverse function of the CDF of most random 
variables does not exist in closed-form. Among the rare exceptions are the uniform distribu- 
tion, unif (xja, 6), and the exponential distribution, exp(x|A), for which the quantile functions are 
Q{p\a,h) — (1 — p)a + pb and Q{p\X) = — log(l — p)/X, respectively (see Gilchrist, 2000). For 
discrete random variables, whose CDF may only be weakly monotonic, the quantile distribution 
function (QDF) is defined as 

Q{p) := inf {a; e R : F{x) > p} , (2.43) 

for every p e [0, 1]. Note that the latter definition holds for any arbitrary CDF, whether continuous 

or discrete. Like the CDF. the QDF is monotone non-decreasing in its argument. It is continuous 
for continuous random variables, and discrete for discrete random variables. However, whereas 
the F{x) is right-continuous, its inverse is, by convention, continuous from the left. This last 
property is a consequence of the use of the infimum in equation 2.43. When it is deemed necessary 
for clarity, we will specify the random variable for which a quantile is computed by a subscript, 
such as in Qxip), for the random variable X. More generally, we will distinguish between the 
theoretical QDF and the em,pirical QDF, by denoting the former by Q{p) and the latter by Qnip)- 
For some parameter ensemble of size n, we define the empirical QDF as follows, 

Qn{p) ■.= m.m{9i,...,en:Fniei)>p}, (2.44) 

where F„ is the EDF of the parameter ensemble as defined in equation (2.34). Note that our 
utilisation of the EDF, in this context, corresponds to a slight abuse of the concept. As afore- 
mentioned, the EDF of a particular random variable, assumes that several iid realisations of 
that random variable are available. In our case, given the hierarchical nature of the models under 
scrutiny, and the possible spatial structure linking the different geographical units of interest, such 
an iid assumption is not satisfied. Our use of the term empirical distribution should therefore be 
understood solely in reference to our chosen mode of construction for the discrete distribution of 
an ensemble of realisations. 
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When wc wish to emphasise the vector of parameters with respect to which the QDF and EDF 
are defined, we will use the notation, Qx and Fx, respectively. Sometimes, it will be useful to 
allow the function Q{-) to accept multivariate arguments, such as p := {pi, . . . ,Pk}- In such cases, 
the QDF will be vector-valued, such that 

Q(p) = . . . ,pk}) = {QiPi), QiPk)}. (2.45) 

In contrast to the sample mean, for which the relationship E[(7(X)] = (7(E[X]) only holds for 
linear function g{-), the quantile function satisfies 

Qh(X)(p) =/i[0(p)], (2.46) 

for every monotonic non-decreasing function h{-). In particular, we have the following useful 
transformation 

Qios{x){p)=log[Qx{p)], (2.47) 

which follows from the monotonicity of the logarithm. Moreover, we can recover the mean of X 
from its quantile function, by integrating the quantile function over its domain. That is, 

/ Qx{p)dp = E[X], (2.48) 

J [0,1] 

which is a standard property of the quantile function (see Gilchrist, 2000). When considering 
the quantile function of the standard cumulative normal distribution, denoted $(x), the quantile 
function specialises to the probit, defined as probit(p) := ^~^{p). The probit is widely used in 
econometrics as a link function for the generalised linear model when modelling binary response 
variables. 



2.3.3 Quantiles, Queirtiles and Percentiles 

In this thesis, we will be mainly concerned with the quantiles of parameter ensembles, these 
quantities will therefore be defined with respect to the EDF of a vector 6. In such cases, the p^^ 
empirical quantile of the ensemble is formally defined as 

0(p) ■■= Qeip), (2.49) 

where the .50*^ quantile is the empirical median. In the sequel, we will also make use of the .25**^, 
.50**^ and .75*^ quantiles, which are referred to as the first, second and third empirical quartiles, 
respectively. Of particular interest is the difference between the third and first quartiles, which 
produces the empirical interquartile range (IQR). For some parameter ensemble 6, we have 

IQR(0) :=^(.75)-e(.25)- (2.50) 

In some texts, the term quantile is used to classify different types of divisions of the domain 
of a probability distribution. More precisely, the fc*'' (j-quantilc is defined as Qg{k/q). In that 
nomenclature, the percentiles therefore correspond to the 100-quantiles, which would imply that 
the values taken by the quantile function are, in fact, percentiles. Other authors, however, have 
used the term quantile in a more straightforward manner, where the p^^ quantile is simply defined 
as Qeip), as in equation (2.49) (see, for instance, Koenker and Bassett, 1978). In the sequel, we 
adopt this particular definition of quantiles and only use the term percentiles to refer to percentile 
ranks, as introduced in section 2.3.1. 

There exist different techniques to derive the quantile function and quantiles of a particular 
CDF (sc!e Stciinbrecher and Shaw, 2008, for recent advances). One of the most popular methods has 
been the algorithm AS 241 (Wichura, 1988), which permits to compute the empirical p"' quantile 
of any finite parameter ensemble very efficiently. We will make use of this standard computational 
technique in chapter 3. 
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2.4 Loss Functions for Parameter Ensembles 



In this section, we review three important approaches, which have been proposed to address the 
issue of hierarchical shrinkage that wc have highhghted in the previous section. These three 
decision-theoretic perspectives will be specifically considered in chapters 3 and 4, and compared 
with other methods through the construction of plug-in estimators. Here, the parameter space 
satisfies © C R", and V = &, as before. 



2.4.1 Constrained Bayes 

In order to address the problem associated with hierarchical shrinkage, described in section 2.2, 
Louis (1984) developed a particular set of point estimates that automatically correct for hierar- 
chical shrinkage. This approach is a constrained minimisation problem, where SSEL(0, 9^'^^) = 
Sr=i(^i~^r')^ is minimised subject to the following two constraints: (i) the mean of the ensemble 
of point estimates is equal to the mean of the true ensemble. 



oest 



:=-Er* = -E^i=:^". (2.51) 
and (ii) the variance of the ensemble of point estimates is equal to the variance of the true ensemble, 

lY^ier-e-^)' = l^{e,-e)\ (2.52) 

1=1 1=1 

Based on these constraints, Louis (1984) derived the optimal Bayes estimator minimising the 
corresponding constrained SSEL function. Since this method is developed within a Bayesian 
framework, it is generally referred to as the constrained Bayes (CB) or constrained empirical 
Bayes (EB) estimator, albeit its use is not restricted to models estimated using EB techniques 
(see Rao, 2003). This result was further generalised by Ghosh (1992) for any distribution belonging 
to the exponential family. 

Theorem 2 (Ghosh-Louis Estimator). The minimiser of the SSEL under the constraints in equa- 
tion (2.51) and (2.52) is 

^CB ^^SSEL ^ (1 _ ^)|SSEL^ (2.53) 

where Of^^^ := ^0^\y], for every i = l,...,n, and P^^^ := l/nX;r=i ^i^^^ ^^^^ '^^^ 
empirical distribution of posterior means. The weight w is defined as 



1 + 



1/2 

n-iEr=iVar[0,|y] 



Er=i (< 



~ \ 2 

;SSEL _ ^SSEL j 



(2.54) 



where Var[^i|y] is the posterior variance of the i parameter in the ensemble. 

Proof. The minimisation of the posterior expected loss is a constrained minimisation problem, 
which can be solved using Lagrange multipliers (see Rao, 2003, page 221). □ 

The role played by the weight co is more explicitly demonstrated by transforming equation 
(2.53), in order to obtain the following 

^CB ^ |SSEL ^ ^ I^^SSEL _ |SSEL j _ (2.55) 

In addition, note that the expression in equation (2.53) resembles a convex combination of the 
corresponding SSEL estimator of the i**^ element with respect to the mean of the ensemble. This 
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is not the case, however, since the weight a; may take values greater than unity. Note also that 
the weights do not depend on the index i. This is a constant quantity, which is identical for all 
elements in the parameter ensemble. In particular, for the Gaussian compound case, described in 
equation (3.19), the CB estimates have a direct interpretation in terms of hierarchical shrinkage. 
Let 

y.'^'7V(0„a2), 0i'^Nif,,T% (2.56) 

where we assume that u^, and /x are known. Note that here, for convenience, we have assumed 
cr^ to be constant over all elements in the parameter ensemble, which differs from the model 
specification described in equation (3.19). The use of the Bayes' rule under quadratic loss yields 
the conventional posterior mean, ^p^L _ 7)//, where 7 is the countershrinkage parameter, 

defined as 

In this context, it can be shown that the Ghosh-Louis constrained point estimates bears some 
important structural similarities with the formulae of the posterior means for this model. We have 

^'p=y/\.+(i-y/^)M, (2.58) 

where < 7 < 1 (see Rao, 2003, p. 212), for every i = 1, . . . ,n, and where = means that the 
equality is an approximation. This is a particularly illuminating result, which intuitively illustrates 
how this estimator controls for the under-dispersion of the posterior means. The coutershrinkage 
parameter 7 is here given more weight by taking its square root, as opposed to the use of 7 in 
computing the ^f^^^'s. This produces a set of point estimates which are therefore more dispersed 
than the posterior means. 

This set of point estimates has very desirable asymptotic properties. As an ensemble, the mean 
and the variance of the CB point estimates converge almost surely to the mean and variance, 
respectively, of the true ensemble distribution (Ghosh and Rao, 1994). Furthermore, the CB 
estimator has a readily interpretable formulation as a shrinkage estimator, as we have seen for the 
compound Gaussian case. However, the CB approach also suffers from an important limitation: 
Its performance will be greatly influenced by the functional form of the true ensemble distribution. 
In particular, since the CB estimator only match the first two moments of the true ensemble, the 
empirical distribution of the CB point estimates may provide a poor approximation of the ensemble 
distribution, when the distribution of interest is substantially skewed. The next approach attempts 
to address this limitation by directly optimising the estimation of the EDF of the parameter 
ensemble. 



2.4.2 Triple-Goal 

The triple-goal estimator of a parameter ensemble was introduced by Shen and Louis (1998). It 
constitutes a natural extension of the CB approach. Here, however, instead of solely constraining 
the minimisation exercise with respect to the first two moments of the ensemble distribution, we 
consider three successive goals, which are optimised in turn (see also Shen and Louis, 2000). The 

set of point cistiniates resulting from these successive minimisations are generally referred to as the 
GR point estimates, where G denotes the EDF and R refers to the ranks. For consistency, we will 
therefore adhere to this acronym in the sequel (Shen and Louis, 1998). Note, however, that the 
GR point estimates are not optimal for these three goals, but will produce very good performance 
under each of these distinct objectives. These three consecutive steps can be described as follows. 

i. Firstly, minimise the ISEL function, as introduced in equation (2.35), in order to obtain an 
estimate Fn{t) of the ensemble EDF, F„(t), for some ensemble of interest, 6 = {9i, . . . , 
As we have seen in equation (2.36), the optimal estimate is the posterior EDF defined as 
follows, 

1 " 

F„(i) := E[F„(0|y] = -J2m < t\y]. (2.59) 

1=1 
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ii. Secondly, minimise the ranks squared error loss (RSEL) in order to optimise the estimation 

o£ the ranks of the parameter ensemble 9. Let R"''* := IV'''\9) and R := R{9) denote the 
vector of candidate ranks and the vector of true ranks, respectively. We wish to minimise 
the following loss function, 

n 

RSEL(R,R''''*) = -Y^(Ri- ■ (2-60) 

The vector of optimisers R of the posterior expected RSEL is composed of the following 
elements, 

n 

Ri := E[Ri\y] = ^P(^i > ^^ly). (2.61) 

The Ri's are not generally integers. However, one can easily transform the -R,'s into integers 
by ranking them such that 

Ri:=iank{Ri\K), (2.62) 

for every i = 1, . . . ,n. The iJj's are then used as optimal estimates of the true ranks, under 
RSEL. 

iii. Finally, we generate an ensemble of points estimates, conditional on the optimal estimate of 

the ensemble EDF, Fn, and the optimal estimates of the true ranks, i?j's. This is done by 
setting 

e?^ := (^^) , (2.63) 

for every i = 1, . . . , n. The —1 in the numerator of equation (2.63) arises from the minimi- 
sation of the posterior expected ISEL (see Shen and Louis, 1998, for this derivation). 

Despite the seeming complexity of the successive minimisations involved in producing the triple- 
goal estimators, the computation is relatively straightforward (more details can be found in Shen 
and Louis, 1998). We note, however, that one of the limitations of the triple-goal technique is 
that it heavily relies on the quality of the prior distribution -that is, the joint prior distribution 
for the Oi's, which we denoted by p{9\^) in equation (2.19). Specific non-parametric methods have 
been proposed to obtain an EB estimate of the prior distribution, which permits to attenuate 
these limitations. The smoothing by roughening (SBR) algorithm introduced by Shen and Louis 
(1999) is an example of such a technique. In such cases, the joint prior distribution, p{9\^), is 
estimated using the SBR algorithm. Estimation of the parameter ensemble is then conducted using 
standard Bayesian inference based on this SBR prior. Shen and Louis (1999) demonstrated good 
performance of this method using simulated data and rapid convergence of the SBR algorithm. 

2.4.3 Weighted and Weighted Ranks Loss Functions 

The standard SSEL framework can also be extended by the inclusion of a vector of weights within 
the quadratic loss. These weights, denoted 4'{^i)i in^'Y be specified as a function of the unknown 
parameters of interest, with ^ : R i-)- R. The weights may therefore be made to vary with each of 
the elements in 9. In addition, since loss fimctions are assumed to be strictly positive, all weights 
are also constrained to be positive: they satisfy (pi > 0, = 1, . . . ,n. The resulting weighted 
squared error loss (WSEL) is then 

n 

WSEL(0, 9, r^*) = J2 ^i^iWi - ^T')\ (2-64) 

i=l 
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with (j){9i) > for all i = 1, . . . , n. The set of optimal point estimates, denoted ^J^^^^'s, that 
minimise the posterior expected WSEL take the following form 

^™ :=E[.^(^,)^,|y]/E[.^(^,)|y], (2.65) 

for every i = 1, . . . ,n. Naturally, if the weights do not depend on the parameters of interest, then 
the optimal estimators reduce to the posterior means. That is, when the weights arc independent 
of 0, they can be extracted from the expectations and cancel out. In the ensuing discussion, we 
will therefore implicitly assume that the weights are functions of the 6, and they will be simply 

denoted by (pi's. 

The weighted ranks squared error loss (WRSEL) is a component-wise loss function that spans 
the entire set of order statistics of a given function (Wright et al., 2003). The vector of weights 
is here dependent on the ranks of each of the ff^'s. Taking 6/WRSEl ^j^g Bayes decision under 

WRSEL, we therefore have a compound loss function of the form, 

n n 

WRSEL(0, e, 0-*) = Y,Y1 -^J^i^^- = ^X^' - ^T'f, (2-66) 
i=i j=i 

where as before Rj := Ri{9) is the rank of the i^^ element in the parameter ensemble. For 
notational convenience, we write 

n 

WRSEL(0, e, 0««*) = ^ (fR. {0i - er'f, (2.67) 

where t^i/jj := J2^=i ^j^i^j — *} identifies the weight that is assigned to the i*^ rank. The Bayes 
action minimising the posterior expected WRSEL is an ensemble of point estimates ^wrsel^ 
whose elements are defined as 

^wRSEL. E;=l^.E[g.|i?,-»,ym=^|y) 

where V{Rj = i\y) is the posterior probability that the i^^ element has rank j. Each estimator 
^WRSEL therefore be seen as a weighted average of conditional posterior means of 9i, given 
that this element has rank j. Note that we can further simplify the formulae provided by Wright 
et al. (2003) by using the law of total expectation, J E{x\y)p{y)dy = ^{x), such that 



3WRSEL _ ^[^i'^flJy] 



(2.69) 



for every i = l,...,n, which is equivalent to the minimiser of the standard WSEL function 
presented in equation (2.64). 

In addition, Wright et al. (2003) noted that the WRSEL and the SSEL are equivalent up to a 
multiplicative constant, which is the vector of weights 0. That is, 

WRSEL = SSEL. (2.70) 

Moreover, the WRSEL trivially reduces to the SSEL when = 1„. It therefore follows that the 
WRSEL is a generalization of the SSEL. This family of loss functions allows the targeting of a 
range of different inferential goals by adjusting the shape of the vector of weights. 0. Wright 
et al. (2003) and Craigmile et al. (2006) proposed a vector of weights consisting of a bowl-shaped 
function of the rank of each of the ^j's, which emphasises estimation of the extreme quantiles of 
the ensemble distribution. Wright et al. (2003) makes use of the following specification for the 
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(2.71) 



for every i = 1, . . . ,n, where both oi and 02 are real numbers greater than 0. Different choices of 
fli and 02 permit to adjust the amount of emphasis put on the extremes of the EDF. In the sequel, 
when considering spatial models, these two parameters will be fixed to ai = 02 = 0.5, which is 
a symmetric version of the specification used by Wright et al. (2003). When considering non- 
spatial models, however, we will see that the shape of the ensemble of WRSEL point estimates 
is highly sensitive to the choice of ai and a2, as will be described in section 3.3.5. For such 
non-spatial models, we will choose the following 'softer' specification: ai = 02 = 0.05. Such 
symmetric specifications of the (j)i^s emphasises the estimation of the extrema over the estimation 
of other elements occupying the middle ranks of the parameter ensemble. Therefore, the ensemble 
distribution of WRSEL point estimators resulting from such a specification of the ^j's will be 
expected to be more dispersed than the empirical of the posterior means. 



2.5 Research Questions 

The main focus of this thesis is the estimation of functions of parameter ensembles. Given a 
particular BHM, we wish to estimate a function h{-) of the parameter ensemble 6. As we have 
seen in section 2.1.4, the optimal estimator of h{6) for some loss function L is the following, 

(5* := argminE[L(/i(6l),(5)|y] . (2.72) 

5 

Depending on the nature of the function h{-), the computation of the optimiser 6 could be 

relatively expensive. By contrast, computation of the ensemble of optimal point estimates, 0^ , 
for a variety of commonly used loss functions L' , is generally straightforward. Moreover, in the 
interest of consistency, one generally wishes to report a single set of point estimates that may 
simultaneously satisfy several inferential desiderata. Thus, we are interested in evaluating the 
performance of h{d^ ) as an alternative to the optimal estimator, 6 . For instance, one may 
evaluate the performance of a plug-in estimator based on the ensemble of posterior means, when 
this particular estimator i.e. h{6^^^^) is used in the place of S . 

A natural way of comparing the performance of different sets of estimators, is to utilise the 
posterior regret (Shen and Louis, 1998). The posterior regret expresses the penalty paid for using a 
sub-optimal estimator. That is, it is defined as the difference between the posterior loss associated 
with using the Bayes choice and the posterior loss incurred for using another estimator. Formally, 
the posterior regret for using S' under L is 

regret(L,(5') := E[L(6', ,5')|y] - minE[L(6l, (5)|y]. (2.73) 

5 

As for loss functions, regret(L, 5') > 0, with equality holding if and only if 5' = argminE[L(6', S)\y]. 
One of the first utilisations of this concept was Savage (1954), although not expressed in terms 
of posterior expectations. Shen and Louis (1998) specifically used the posterior regret when 
comparing ensembles of point estimators. This concept has also been used in more applied settings 
such as in computing credibility premiums in mathematical finance (Gomez-Deniz et al., 2006). 

For our purpose, we can specialise the definition in equation (2.73) to the estimation of func- 
tions of parameter ensembles in BHMs. We will be interested in considering the posterior regret 
associated with using a plug-in estimator, h{0^ ), for some loss functions L' . We will therefore 
compute the following posterior regret, 

regret(L,/i(0^')) = E[L(0,/i(0^'))|y] -minE[L(0,<5)|y], (2.74) 

o 

where the optimal estimator S under L may be vector-valued. In general, we will refer to the 
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quantity in equation (2.74) as the posterior regret based on L{6, h{9^ )). When comparing different 
plug-in estimators, the classical loss function, denoted L' in equation (2.74), will be taken to be 
one of the loss functions reviewed in section 2.4. In particular, following the research conducted 
in estimating parameter ensembles in BHMs, the L' function of interest will be taken to be the 
SSEL, WRSEL, constrained Bayes or triple-goal loss. Moreover, we will also evaluate the posterior 
regret for the ensemble of MLEs, as these estimators represent a useful benchmark with respect 
to which the performance of other plug-in estimators can be compared. For clarity, since optimal 
estimators under L' are vectors of point estimates, we will denote them as 0^ . In order to facilitate 
comparison between experimental conditions where the overall posterior expected loss may vary, 
it will be useful to express the posterior regret in equation (2.74) as a percentage of the minimal 
loss that can be achieved using the optimal minimiser. That is, we will generally also report the 
following quantity, 

T. ,r^T\ 100 X regrct(L, /i(0^')) 

PercRegret(L, ^0^ ) := ^^ (2-75) 

which we term the percentage regret. Both the posterior and percentage regrets will be used 
extensively in the ensuing chapters. In order to emphasise the distinction between these two 
quantities, the posterior regret in equation (2.74) will sometimes be referred to as the absolute 
posterior regret. 

Specifically, in chapters 3 and 4, we will study two aspects of parameter ensembles in BHMs. 
Firstly, motivated by current practice in both epidemiology and spatial epidemiology, we will 
consider the optimal determination of the heterogeneity or dispersion of a parameter ensemble. In 
this case, the function h{-) will either be the quantile function, as described in section 2.3.3 or a 
ratio of quartiles. Secondly, following an increased demand for the classification of epidemiological 
units in low-risk and high-risk groups -for surveillance purposes, for instance- we will concentrate 
our attention on optimising such classification. Here, the function h{-) will be defined as an 
indicator function with respect to some cut-off point of interest, denoted by C. Specifically, we 
will use two variants of the posterior regret introduced in equation (2.74) for the following choices 
of L and h{-): 

i. In chapter 3, we will optimise the estimation of the heterogeneity of a parameter ensemble. 

Our main focus will be on defining h{-) as a ratio of some of the quartiles of the parameter 
ensemble 9, with L being the quadratic loss. 

ii. In chapter 4, we will treat the problem of classifying the elements of a parameter ensemble 

below and above a particular threshold. Here, the function h{-) will be defined as the 
indicator function with respect to some cut-off point C, and the loss function of interest L 
will penalise misclassifications. 
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Chapter 3 



Empirical Quantiles and Quartile 
Ratio Losses 

Summary 

The amount of hotorogcnoity present in a parameter ensemble is crucial in both 
epidemiology and spatial epidemiology. A highly heterogeneous ensemble of RRs, 
for instance, may indicate the presence of a hidden risk factor not included in the 
model of interest. In modern BHMs, however, the quantification of the heterogene- 
ity of a parameter ensemble is rendered difficult by the utilisation of link functions 
and the fact that the joint posterior distribution of the parameter ensemble is gen- 
erally not available in closed-form. In this chapter, we consider the estimation of 
such heterogeneity through the computation of the empirical quantiles and em- 
pirical quartile ratio (QR) of a parameter ensemble. We construct a simple loss 
function to optimise the estimation of these quantities, which we term respectively 
the quantile squared error loss (Q-SEL) and the QR squared error loss (QR-SEL) 
functions. We compare the performance of the optimal estimators under these two 
losses with the use of plug-in estimators based on ensembles of point estimates 
obtained under standard estimation procedures using the posterior regret. These 
candidate plug-in estimators include the set of MLEs and the ensembles of point 
estimates under the SSEL, WRSEL, CB and triple-goal functions. In addition, 
we also consider the ratio of posterior quartiles (RoPQ), as a possible candidate 
for measuring the heterogeneity of a parameter ensemble. We evaluate the per- 
formance of these plug-in estimators using non-spatial and spatially structured 
simulated data. In these two experimental studies, we found that the RoPQ and 
GR plug- in estimators tend to outperform other plug-in estimators, under a vari- 
ety of simulation scenarios. It was observed that the performance of the WRSEL 
function is highly sensitive to the size of the parameter ensemble. We also noted 
that the CAR Normal model tends to yield smaller posterior losses when using the 
optimal estimator under both the Q-SEL and QR-SEL. In addition, we computed 
these optimal and plug-in quantities for a real data set, describing the prevalence 
of cases of schizophrenia in an urban population. On this basis, we draw some 
recommendations on the type of QR summary statistics to use in practice. 

3.1 Introduction 

In mixed effects models, which can be regarded as the frequentist version of BHMs, the main focus 
of the analysis has historically been on estimating fixed efi'ects -that is, the regression coefficients of 
particular covariates. However, since the values of the fixed effect parameters are highly dependent 
on the values taken by the random effects, statisticians' attention has progressively turned to 
the systematic study of random effects (Demidenko, 2004). Such a vector of random effects 
can be described as a parameter ensemble as introduced in section 2.2.1. A standard way of 
quantifying the amount of heterogeneity present in a parameter ensemble is to use the intraclass 
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coefBcient (ICG). The ICC can be computed whenever quantitative measurements are made on 
units organised into groups. This statistic constitute a useful complement to the analysis of 
variance (ANOVA), for instance, where individual units are grouped into N groups, with j = 
1,. . . ,N. The nested observations are then modelled as 



where Uj ~ A^(0,cr^) and eij ~ N{0,a'^). Here, the aj's are independent of the ejj's, and we 
assume that there is an identical number of units in each group, i.e. i = 1, . . . ,n. For this model, 
the ICC is defined as follows, 

ICC := (3.2) 

The ICC is a useful measure of parameter ensemble heterogeneity for general linear models that 
contains random effects. However, there docs not exist an equivalent measure for generalised linear 
models. Indeed, in generalised linear models, the random effects of interest affect the response 
through a link function, which transforms the distribution of the original random effects. Thus, 
the variance parameter controlling the variability of these random effects only provides a distorted 
image of the dispersion of the parameter ensemble of interest. 

Generalised linear models are commonly used in epidemiology and spatial epidemiology. In 
this context, the heterogeneity of a set of random effects may be indicative of the effect of hidden 
covariates that have not been included in the model. An accurate estimation of the dispersion 
of the random effects is therefore crucial for the detection of unexplained variability, which may 
then lead to the inclusion of new candidate risk factors. Most Bayesian models used in modern 
epidemiological practice, however, are hierarchical. In such models, the set of parameters of interest 
will be a non-linear function of one or more random effects as well as unit-specific covariates. 
The units may here be administrative areas within a region of interest such as hospitals, schools 
or other aggregates. For general BHMs, therefore, there does not exist a unique parameter, 
which controls for the variability of the unit-specific levels of risk. The variance parameters of 
several random effects can be added together, but this only constitutes a heuristic solution for 
quantifying the heterogeneity of the resulting empirical distribution of the parameters of interest. 
For such generalized linear models commonly used in epidemiology, there is therefore a need for a 
statistically principled way of estimating the heterogeneity of parameter ensembles. 

One of the first formal attempts at estimating the amount of heterogeneity present in a set 
of random effects was conducted by Larsen et al. (2000). These authors studied a mixed effects 
logistic regression model, and considered the variance of the ensemble distribution of random 
effects, as a potential candidate for a measure of dispersion. Formally, they considered N groups 
of individuals with n subjects in each group. The probability of the i*^ individual in the cluster 
to develop the disease of interest was modelled as 



where i = 1, . . . , n and j = 1, . . . ,N , and where the random effects are assumed to be independent 

and identically distributed such that aj ~ N{0, cr^), for every j. The /5fe's constitute a set of fixed 
effects associated with individual-specific if -dimensional vectors of risk factors denoted by Xj, for 
every i. Although the variance parameter, cr^, controls the distribution of the random effects, it 
only affects the probability of developing the disease of interest through the logistic map and the 
exact relationship between the variances at the first and second-level of the model hierarchy is 
therefore difficult to infer from the sole consideration of cr^ . 

Larsen et al. (2000) addressed this problem by assessing the amount of variability in the a^ 's 
by computing the median odds ratio (MOR). For two randomly chosen a^-'s drawn from the 
empirical distribution of the parameter ensemble, Larsen et al. (2000) considered the absolute 



Dij = IJ. + aj + etj, 



(3.1) 




(3.3) 
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difference between these two parameters. Since; the Q:j's arc iid normal variatcs centered at zero 
with variance ct^, as described above, it can be shown that the difference between two such random 
variates is Uj — afc ~ N{Q, 2cr^) for any pair of indices satisfying j ^ k (Spiegelhalter et al., 2004). 
Moreover, the absolute difference between two randomly chosen random effects is a truncated 
Gaussian, sometimes called a half-Gaussian, with mean zero and variance 20-^. Larsen et al. 
(2000) suggested taking the median of that distribution. For the aforementioned specification, the 
median of such a truncated Gaussian is $~^(0.75) x V^cr^ = I.OQct^. When the a^-'s are ORs on the 
logarithmic scale, the dispersion of the parameter ensemble becomes exp(1.09cr^) (see also Larsen 
and Merlo, 2005). The MOR has been met with singular success in the literature, and has been 
adopted by several authors in the medical and epidemiological sciences. In particular, it has been 
used in cancer epidemiology (e.g. Elmore et al., 2002) and in the study of cardiovascular diseases 
(e.g. Sundquist et al., 2004), as well as in the monitoring of respiratory health (e.g. Basagana 
et al., 2004). 

In a Bayesian setting, however, the MOR will only be available in closed-form if the posterior 
distribution of the random effect ensemble is also available in closed-form. This will rarely be the 
case in practice. For more realistic models, the computation of the MOR will typically require 
a substantial amount of computational power in order to be estimated. Specifically, the calcu- 
lation of the MOR is based on the distribution of the differences between two random variates. 
The exhaustive evaluation of such a distribution, however, will require the computation of (") 
differences for a parameter ensemble of size n. If one wishes to estimate this particular quantity 
within a Bayesian context, one could specify a quadratic loss on the MOR. The optimal estimator 
under that loss function is the posterior mean. It then follows that we would need to compute 
the median of the distribution of the absolute differences of any two randomly chosen parameters 
at each iteration of the MCMC algorithm used for fitting the model. Such an estimation scheme 
becomes rapidly unwieldy as n grows (for n = 2000, about 2 x 10^ differences, \aj — ak\, need to 
be computed at each MCMC iteration). The use of the MOR for Bayesian models thus appears 
to be impractical. 

A less computationally intensive method for quantifying the dispersion of a parameter ensemble 
is the use of the quartile ratio (QR). This quantity is defined as the ratio of the third to the first 
quartile of a parameter ensemble. The use of quartiles for measuring heterogeneity comes here 
with no surprise as these summary statistics have been shown to be especially robust to deviation 
from normality (Huber and Ronchetti, 2009). That is, the utilisation of quartiles as summary 
statistics of the data is less sensitive to small departures from general modelling assumptions. 
This robustness of the quartiles is therefore especially significant when the field of application is 
the generalised linear model, as in the present case. We will see that, for models with no covariates, 
the logratio of the third to the first quartile of the empirical distribution of the RRs is equivalent 
to the IQR of the random effects on their natural scale. Like the IQR, the ratio of the third to 
first quartiles of the RRs can thus be regarded as a non-parametric measure of dispersion. 

As any function of a parameter ensemble, the QR is amenable to standard optimisation proce- 
dures using a decision-theoretic approach. In this chapter, we will construct a specific loss function 
to quantify the estimation of this quantity. In addition, it will be of interest to optimise the esti- 
mation of the quantiles of the ensemble distribution. Specifically, we will study the computation of 
the optimal first and third quartiles of the ensemble distribution of the parameters of interest. As 
described in section 2.5, one of our goals in this thesis is to compare the use of optimal estimators 
with the performance of plug-in estimators, based on commonly used ensembles of point estimates. 
We will conduct these comparisons using the posterior regret, as introduced in chapter 2. The 
performance of the various plug-in estimators will be assessed by constructing two sets of data 
simulations. Different experimental factors deemed to influence the estimators' performance will 
be manipulated in order to precisely evaluate their effects. In particular, the synthetic data sets 
will be constructed using both a non-spatial and a spatial structure. The non-spatial simulations 
used in the sequel reproduce in part some of the simulation experiments conducted by Shen and 
Louis (1998), whereas the spatially structured risk surfaces will be based on the simulation study 
implemented by Richardson et al. (2004). 

In addition, we will also be evaluating the impact of using different types of plug-in estimators 
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for the estimation of the dispersion of parameter ensembles in a real data set. We will re- analyse 
a published data set describing schizophrenia prevalence in an urban setting. Delineating the 
geographical distribution of psychoses plays an important role in generating hypotheses about 
the aetiology of mental disorders, including the role of socioeconomic status (Giggs and Cooper, 
1987, Hare, 1956, HoUingshead and Redlich, 1958), urbanicity (Lewis et al., 1992), migration 
(Cantor-Graae et al., 2003), ethnicity (Harrison et al., 1997) and, more recently, of family history 
interactions that may betray effect modification by genes (Krabbendam and van Os, 2005, van Os 
et al., 2003). A raised prevalence of schizophrenia in urban areas was first observed over 75 years 
ago (Faris and Dunham, 1939), and has been consistently observed since (Giggs, 1973, Giggs and 
Cooper, 1987, Hafner et al, 1969, Hare, 1956, Lewis et al., 1992, Loffler and Haffner, 1999). Faris 
and Dunham (1939) demonstrated that the highest rates of schizophrenia occurred in inner-city 
tracts of Chicago with a centripetal gradient. More recently, a dose-response relationship between 
urban birth and the prevalence of schizophrenia has been reported in Sweden (Lewis et al., 1992), 
the Netherlands (Marcelis et al., 1998) and Denmark (Pedersen and Mortensen, 2001), suggesting 
that a component of the aetiology of schizophrenia may be associated with the urban environment. 
This specific data set is especially relevant to the task at hand, since it is characterised by low 
expected counts, which tend to lead to high levels of shrinkage in commonly used ensembles of 
point estimates (Gelman and Price, 1999); thereby providing a good testing ground for comparing 
the use of optimal dispersion estimators with the construction of plug-in estimators. 

This chapter is organised as follows. We will first describe our decision-theoretic approach to 
the estimation of both empirical quantiles and empirical QR in section 3.2. The construction of 
the non-spatial simulations as well as the comparison of the optimal estimators with some plug-in 
estimators of interest will then be reported in section 3.3. The specification of the spatial simulation 
scenarios and the effects of the different experimental factors considered will be described in section 
3.4. Finally, the analysis of the schizophrenia prevalence data set will be given in section 3.5, and 
some conclusions on the respective performance of the different studied estimators will be provided 
in section 3.6. 

3.2 Estimation of Empirical Quantiles and Quart ile Ratio 

We here introduce two specific loss functions, which formalise our approach to the estimation of 
parameter ensembles' quantiles and QR. Both of these loss functions are straightforward adapta- 
tions of the quadratic loss, which provides easily derivable optimal minimisers. In addition, we 
discuss the computation of the respective posterior regrets for these loss functions, as this will be 
used for evaluating the performance of the plug- in estimators of the quantities of interest. 

3.2.1 Empirical Quantiles 

The empirical quantiles of a parameter ensemble are especially relevant when one wishes to quantify 
the dispersion of the ensemble distribution of a parameter ensemble. As defined in section 2.3.3, 
the p*^ empirical quantile of an n-dimensional parameter ensemble 9 is 

Oip) ■= Qe(p), (3.4) 

where Qe{-) is the empirical QDF of 6, and p G [0, 1]. The quantity 9(^p^ is a non-linear function 
of 6. Optimal estimation of this quantity can be formalised by quantifying our loss using an SEL 
function on which takes the following form, 

SEL{Qe{p),S) = {e^^-6)\ (3.5) 

We saw in section 2.1.4 that the optimal estimator for such a posterior expected SEL is the 

posterior mean of the QDF of evaluated at p. Note that because 6'(p-| is a function of the entire 
parameter ensemble 6, it follows that the posterior expected loss depends on an n-dimensional 
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joint posterior distribution. When more than one quantile is of interest, one may use a quantiles 
squared error loss (Q-SEL) defined for some /c-dimensional vector p e [0, l]'^ as 

k k 

Q-SELp {e,S) := ^SEL {Qe{Pj),6j) = ^ (9^^^) - 6,)' , (3.6) 

where we have emphasised the dependence of the Q-SEL on the full parameter ensemble d. Note, 

however, that 6 and S will generally not have the same dimension. Here, 9 is n-dimcnsional 
whereas S is A;-dimensional. Thus, the posterior expected Q-SEL is minimised by a A;-dimensional 
vector 0Q-SEL^ which has the following elements, 

^?gf)^^ E [QeiPiM = E [9^,^) \ y] . (3.7) 

for every j = 1, . . . , fc, where the expectation is taken with respect to the joint posterior distribu- 
tion, p{9i, ...,9n\y). 

Parameter ensemble quantiles can also be estimated using plug-in estimators, which are based 
on empirical distributions of point estimates, as described in section 2.5. For any loss function L', 
we define the empirical quantiles of an ensemble of point estimates denoted 9^ , as follows, 

Ofp) ■■=■■ Qo-' ip) ■■= min {^f, . . . , : F^,, {9f' ) > p} , (3.8) 

for any p € [0, 1], where the EDF of the ensemble 9^ is defined as 

1 " 

Po-'it):=-Y,I{0^ <t}. (3.9) 

Note that equation (3.8) is solely the empirical version of the general definition of a quantile 
reported in equation (2.43) on page 28. The performance of such plug-in estimators of the posterior 
quantiles will be evaluated and compared to the optimal Baycs choice under Q-SEL. In particular, 
we will consider ensembles of point estimates based on the loss functions described in section 2.4. 
This will include the SSEL, WRSEL, CB and GR ensembles of point estimates, as well as the 
MLEs. 



3.2.2 Empirical Quartile Ratio 

A natural candidate for the quantification of the dispersion of a parameter ensemble is the quartile 
ratio (QR). This quantity is defined as the ratio of the third to the first empirical quartile of the 
vector of parameters of interest, 9. For a given BHM, of the type described in equation (2.19), 
the QR is defined as follows, 

(y0(.25) 6'(.25) 

where ^(.25) a-nd ^(.75) denote the first and third empirical quartiles, respectively. For generalized 
linear models, we will compute this quantity on the scale of the data -that is, as a ratio of ORs for 
a binomial likelihood, or as a ratio of RRs for a Poisson likelihood. When considering a log-linear 
model (e.g. Poisson likelihood combined with a normal prior on the log-intensities), the QR is 
related to the IQR introduced in section 2.3.3. If we take the logarithm of the QR, we obtain the 
IQR of the parameter ensemble on the prior scale, i.e. on the scale of the normal prior in the case 
of a log-linear model. We have 

log = log ^(.75) - log ^(.25) = IQR(^?(^)), (3.11) 

where g{-) is the log link function. 
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The decision problem associated with the estimation of the QR can be formulated using a 
quadratic loss taking the QR as an argument. For clarity, we will refer to that particular quadratic 
loss as the QR squared error loss (QR-SEL). This loss function is defined as follows, 



QR-SEL(6>,^) := SEL(QR(6'), 5) = ( - s) . (3.12) 

\^(.25) / 

The QR(-) function is a non-linear mapping of the parameter ensemble 6. Equation (3.12) is of 
the form L{h{0),6), where L is the SEL and h{-) is the QR. The minimiser of the corresponding 
posterior expected loss is therefore the following posterior quantity. 



QR(0|y) := E 



" ^'(.75) 




y 


- ^(-25) 



(3.13) 



which will be referred to as the posterior empirical QR, where recall that ^(.25) and ^(.75) denotes 
the empirical first and third quartiles oi 6 = {Oi,. . . ,6n}. 

It is of interest to compare the performance of this optimal estimator with some plug-in estima- 
tors, QR(^^ ). As before, the ensemble of point estimates 0^ has been obtained as the optimiser 
of L', where L' represents a commonly used compound loss function. This therefore gives the 
following plug-in QR estimator, 

ql' 

qR{d^') := (3.14) 

^(.25) 

where both numerator and denominator are here defined as in equation (3.8). Note that QR(0^ ) 

only depends on the data through the vector of point estimates, 6^ . An alternative solution to 
the problem of estimating the QR of a parameter ensemble is the direct utilisation of the Bayes 
choice for each quartiles. These quantities may have been estimated using the Q-SEL function 
described in equation (3.6). In such a case, once the minimiser of the posterior expected Q-SEL 
has been obtained, we can estimate the QR as the ratio of posterior empirical quartiles (RoPQ). 
That is, we have 

RoPQ(0|y).-^j^^^25)|y]-^Q-SEL' (^.15) 

where the numerator and denominator are components of the; optimal estimator under Q-SEL, as 
described in equation (3.7). Note that we have here drawn an explicit distinction between posterior 
empirical quartiles and quartiles of an ensemble of point estimates, which are respectively denoted 
by 9^-^ and , by using a hat to qualify the latter. 

The QR is a useful indicator of the dispersion of a parameter ensemble, when the elements of 
that ensemble are constrained to take positive values. For parameter ensembles with Normally 
distributed elements, however, one needs to use a different measure of dispersion. In such contexts, 
we will optimise the estimation of the IQR, as introduced in section 2.3.3. Such optimisation can 
be conducted using a quadratic loss on the IQR such that 

IQR-SEL(6>, S) := SEL(IQR(6>), (5) = ((^(.75) - ^(.25)) - S)^ . (3.16) 

The optimal estimator of the posterior expected IQR-SEL is the posterior mean IQR, 

IQR(0|y) := E[e^,r5) - ^(.25) |y]- (3-17) 

As for the posterior QR, plug-in estimators for this quantity can be defined by simply using the 
quartiles of the empirical distribution of the corresponding point estimates. Alternatively, one can 
also use two posterior empirical quartiles to estimate the IQR, as was done for the QR-SEL in 
equation (3.15). The latter estimator will be referred to as the difference of posterior empirical 
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quartiles (DoPQ), defined as follows, 



DoPQ(0|y) :=E[0(.75)|y] -E[^^(.25)|y]. (3.18) 

In the sequel, when dealing with ensembles of real-valued parameters taking both negative and 
positive values, we will automatically replace the QR-SEL with the IQR-SEL. 

3.2.3 Performance Evaluation 

The performance of these different quantities will be evaluated using simulated data, as well 
as compared in the context of real data examples. In the sequel, the posterior expected loss 
associated with the use of the optimal estimators under the Q-SEL and the QR-SEL functions 
will be compared to the posterior penalties incurred when using various plug-in estimators under 
these loss functions. In particular, we will be interested in assessing Q-SEL for p := {.25, .75}, 
as this leads to a direct comparison with the optimisation of the QR-SEL, which is related to the 
estimation of these specific quartiles. As described in section 2.5, we will therefore compute the 
following posterior regrets. 

In the notation of section 2.5, the function /i(-), which takes an ensemble of point estimates as 
an argument, becomes the empirical quantile function or the empirical quartile ratio depending on 
whether we wish to evaluate the posterior regret of the Q-SEL or the one of the QR-SEL function. 
Specifically, when the empirical quantiles are of interest, then equation (2.74) becomes 

regret (Q-SELp,Qg,,(p)) =E[Q-SELp (0,Qg,,(p))| y] - mhiE [Q-SELp(0,5)| y] , 

where the optimal estimator, S, minimising Q-SEL is the corresponding vector of posterior em- 
pirical quartiles, as described in equation (3.7). In the sequel, p will be taken to be {.25, .75}. In 
addition, we will use 

regret (qR-SEL,QR(0^')) =E QR-SEL (6», QR(0^')) y - mmE [QR-SEL(6', (5)| y] , 

when the focus is on the empirical quartile ratio, with the optimal estimator minimising QR-SEL 
being here the posterior empirical QR, as described in equation (3.13). A similar posterior regret 
can be defined for the IQR-SEL function introduced in equation (3.16). For every pair of L and L' 
functions, where L will be either the Q-SEL or QR-SEL functions and L' will be either the SSEL, 
WRSEL, CB or GR loss functions. In addition, we will also consider the use of the RoPQ and the 
DoPQ, as introduced in equation (3.15) and (3.18), when evaluating the QR-SEL and IQR-SEL 
functions, respectively. 



3.3 Non-spatial Simulations 
3.3.1 Design 

The proposed qiiantilc and QR estimators were evaluated using synthetic datascts. The main 
objective of these simulations was to assess the influence of various factors on quantile estimation, 
and more specifically on the estimation of the QR of the posterior ensemble. Three factors were 
of particular interest. Firstly, (i) the size of the parameter ensemble was hypothesised to have 
a significant impact on the quality of the estimation, since as the number of variables in the 
ensemble goes to infinity, the quantiles of that ensemble distribution become better identified. In 
addition, (ii) we varied the level of heterogeneity of the sampling variances of the elements in 
the parameter ensemble. Finally, (iii) we were also concerned with assessing the sensitivity of 
empirical quantile and QR estimation on distributional assumptions. Two different hierarchical 
models were therefore compared. In this set of simulations, the models used to generate the data 
was also used to estimate the different posterior quantities of interest. That is, the generative and 
fitted models were identical. 
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3.3.2 Generative Models 



The data were simulated using two different BHMs, which were partially described in section 2.2.1. 
We here give a full description of these models and our specific choices of hyperparameters. We 
tested for the effect of skewness on the estimation of the parameter ensemble quantiles and QR 
by comparing a compound Gaussian and a compound Gamma model. In the first case, the prior 
distribution on the 9i's is symmetric, whereas in the latter case, the prior on the 9i's is skewed. For 
convenience, conjugacy between the likelihood and the prior was respected for these two models 
(Bernardo and Smith, 1994). Non-conjugate models will be studied in section 3.4, for the case of 
spatial priors. These two simulation models and the specific values given to the hyperparameters 
can be regarded as a replication the first part of the simulation design conducted by Shen and 
Louis (1998). The models of interest in this section can be described as follows. 

i. For the compound Gaussian or Normal-Normal (N-N) model, we had the following likelihood 

and prior, 

y/S?iV(0„af), e,'^' N{^,o,r^), (3.19) 

respectively, for every i = l,...,n. Standard application of Bayes' rule gives 0i\yi 
-^(^P^^,T-o7i), with ef^^^ := 7iMo + (l-7i)yj, and where 7^ := erf /{erf +t^) are the shrink- 
age parameters. The MLEs for this model are given by = for every i = 1, . . . , n. 
In every simulation, the hyperparameters controlling the prior distribution of the ^^'s were 
given the following specification: /xq = 0, and Tq = 1, as in Shen and Louis (1998). Note that 
this simulation setup is somewhat artificial since Tq is fixed to a particular value, instead of 
being estimated from the data. However, this choice of specification has the advantage of 
directly reproducing the simulation study of Shen and Louis (1998). 

ii. For the compound Gamma or Gamma-Inverse Gamma (G-IG) model, the likelihood function 

and prior take the following form, 

j/j '~ Gam(aj, Oi), 9i ~ Inv-Gam(ao, Po), (3-20) 

for every i = 1, . . . ,n. The Gamma and Inverse-Gamma distributions were respectively given 
the following specifications: 

Gam(a,,eO = ^—y-^-^-y^^'^ (3.21) 
with ttj, 9i > for every i = 1, . . . ,n, and 

aaa 

Inv-Gam(ao, Po) = ^^^0^"°''^'^"^'' ^ (3-22) 

with ao,l3o > 0. From the conjugacy of the prior, we obtain the following posterior dis- 
tribution, 9i\yi Inv-Gam(a, + ao,yi + /3o), and the MLEs are 0¥^^ = y^/ai for every 
i = 1, . . . ,n. All simulations were based on the following choices of the hjrperparameters: 
ttp = 4, and Pq = 3, as in Shen and Louis (1998). 

Detailed descriptive statistics of the synthetic data generated by these models are reported in tables 
A.l and A. 2 in appendix A, for the simulated observations, and variance parameters, respectively. 

3.3.3 Simulation Scenarios 

In addition to the different models tested, two other factors were manipulated when producing 
these synthetic data sets. Firstly, in order to evaluate the effect of the size of the ensemble 
distribution on the quality of the classification estimates, we chose three different sizes of parameter 
ensemble. Specifically, in the BHMs of interest, the vector of observations and the vector of 
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parameters are both of size n. In these simulations, we chose n to take the following values: 

n e {100,200,1000}. (3.23) 

The second experimental factor under consideration was the amount of variability associated with 
each data point, i.e. the sampling variances of the ?/j's. This aspect of the synthetic data sets 
was made to vary by selecting different ratios of the largest to the smallest (RLS, in the sequel) 
sampling variances. Formally, 

2 

RLS(o-) := and RLS(a) := (3.24) 

o-(i) a(i) 

for the compound Gaussian and compound Gamma models, respectively. Thus, different choices 
of the vectors cr = {ci, . . . , (t„} and a = {ai . . . , a„} produced different levels of that ratio. We 
generated the scaling parameters similarly in the two models in order to obtain three comparable 
levels of shrinkage. The sampling variances in both models were produced using 

log(a2)'^Unif(-Q,a), and log(ai) ~ Unif(-a, C;), (3.25) 

for the compound Gaussian and compound Gamma models, respectively, and where Z = 1, . . . , 3. 
Different choices of the parameters Ci G {.01, 1.5. 2.3} approximately generated three levels of RLS, 
such that both RLS(tr) and RLS(a) took the following values {1,20, 100}, where small values of 
RLS represented scenarios with small variability in sampling variances. 

In spite of these modifications of n and RLS, the empirical mean of the ensemble distributions 
of the true OiS was kept constant throughout all simulations, in order to ensure that the simulation 
results were comparable. For the compound Gaussian model, the empirical ensemble mean was 
fixed to 0, and for the compoimd Gamma model, the empirical ensemble mc^an was fixed to 
1. These values were kept constant through adequate choices of the hyperparameters of the prior 
distribution in each model. The combination of all these factors, taking into account the 2 different 
models, the 3 diiferent sizes of the parameter ensemble and the 3 different levels of RLS resulted 
in a total of 1,800 different synthetic data sets with 100 replicates for each combination of the 
experimental factors. 

3.3.4 Fitted Models 

As aforementioned, the fitted models were identical to the generative models. For both the com- 
pound Gaussian and compound Gamma models, no burn-in was necessary since each of the 6'j's are 
conditionally independent given the fixed hyperparameters. The posterior distributions were di- 
rectly available in closed-form for the two hierarchical models as both models were given conjugate 
priors. Nonetheless, in order to compute the different quantities of interest, we generated 2,000 
draws from the joint posterior distribution of the 6iS in each model for each simulation scenario. 
The various plug-in estimators were computed on the basis of these joint posterior distributions. 
For this set of non-spatial simulations, the WRSEL function was specified using ai = 02 = 0.05 
(see section 2.4.3 on page 32). 

3.3.5 Plug-in Estimators under Q-SEL 

The results of these simulations for the Q-SEL function are presented in table 3.1 on page 45 
for both the compound Gaussian (N-N) and the compound Gamma (G-IG) models. For the 
posterior regret of each plug-in estimator, wc; have reported in parentheses the posterior regret as 
a percentage of the posterior loss under the optimal estimator. This quantity is the percentage 
regret, as described in section 2.5 on page 34. Specifically, for any ensemble of point estimates. 
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Table 3.1. Posterior regrets based on Q-SELp(0, QgL' (p)) with p := {.25, .75}, for five plug-in estimators 
and with tfie posterior expected loss of the optimal estimator in the first column. Results are presented for 
the compound Gaussian model in equation (2.20) and the compound Gamma model in equation (2.21), 
for 3 different levels of RLS, and 3 different values of n, averaged over 100 replicate data sets. Entries 
were scaled by a factor of 10^. In parentheses, the posterior regrets are expressed as percentage of the 
posterior loss under the optimal estimator. 



Scenarios Posterior regrets"' 







O-SRL 


MLE 


SSEL 


WRSEL 


CB 


GR 


RLS = 1 
















N-N, n = 


100 


24.9 


182.8 (734) 


79.3 (318) 


22.6 (91) 


8.5 (34) 


0.3(1) 


N-N, n = 


200 


12.7 


176.4 (1389) 


81.7 (643) 


355.8 (2802) 


5.3 (42) 


0.0(0) 


N-N, n = 


1000 


2.6 


158.4 (6177) 


80.4 (3134) 


3126.8 (121905) 


1.3 (51) 


0.0(1) 


G-IG, n = 


= 100 


8.8 


170.6 (1938) 


59.7 (678) 


57.6 (654) 


6.0 (69) 


0.2(2) 


G-IG, n = 


= 200 


4.4 


147.7 (3365) 


62.9 (1432) 


182.7 (4164) 


6.2 (141) 


0.1(2) 


G-IG, n = 


= 1000 


0.9 


137.2(15705) 


62.7 (7179) 


2193.1 (251039) 


5.9 (680) 


0.0(4) 


RLS = 20 
















N-N, n = 


100 


24.9 


251.0 (1009) 


95.7 (385) 


12.7 (51) 


8.8 (35) 


0.6(3) 


N-N, n = 


200 


12.4 


233.0 (1886) 


106.4 (861) 


241.6 (1955) 


7.2 (58) 


0.1(1) 


N-N, n = 


1000 


2.5 


210.7 (8327) 


97.8 (3865) 


2506.6 (99044) 


3.3 (131) 


0.0(1) 


G-IG, n = 


= 100 


8.5 


186.9 (2201) 


79.0 (930) 


73.3 (864) 


13.5 (159) 


0.2(3) 


G-IG, n = 


= 200 


4.3 


163.9 (3812) 


76.5 (1779) 


185.9 (4324) 


17.1 (398) 


0.1(2) 


G-IG, n = 


= 1000 


0.8 


156.2(18600) 


77.0 (9167) 


1979.9 (235817) 


18.2(2166) 


0.1(6) 


RLS = 100 














N-N, n = 


100 


23.7 


302.9 (1277) 


111.9 (472) 


6.2 (26) 


14.6 (62) 


0.0(0) 


N-N, n = 


200 


12.3 


254.9 (2077) 


137.3 (1118) 


131.4 (1070) 


18.0 (147) 


0.0(0) 


N-N, n = 


1000 


2.5 


253.8(10161) 


135.2 (5413) 


1903.4 (76214) 


16.1 (643) 


0.0(1) 


G-IG, n = 


= 100 


8.0 


208.4 (2592) 


76.4 (950) 


66.3 (825) 


16.5 (205) 


0.5(6) 


G-IG, n = 


= 200 


4.0 


190.2 (4698) 


79.7 (1968) 


169.0 (4174) 


17.1 (422) 


0.1(2) 


G-IG, n = 


= 1000 


0.8 


190.4(23096) 


80.3 (9744) 


1622.9 (196836) 


25.5(3090) 


0.1(8) 



" Entries for the posterior regrets have been truncated to the closest first digit after the decimal point, and 
entries for the percentage regrets have been truncated to the closest integer. For some entries, percentage 
regrets are smaller than 1 percentage point. 



0^' , we have computed the following percentage regret, 

100 X regret (Q-SELp, g^,, (p)) 

niin5E[Q-SELp(0,<5)|y] ' ^ ' ' 

where the optimal estimator is the vector of posterior empirical quartiles for p := {.25, .75}. In 
the sequel, most simulation results will be presented in tabulated format, where the first column 
will correspond to the denominator in formulae (3.26), which is here posterior expected Q-SELp, 
and the remaining columns will provide both the posterior regret and the percentage regret in 
parentheses expressed with respect to Q^Ll{v)^ for different choices of L' . 

Ovcirall, the lowest posterior regret for estimating the first and third quartiles was achievcxl 
by using the GR plug-in estimator. That is, estimates of the posterior empirical quantiles using 
the ensemble of GR point estimates was almost optimal under both models and all simulation 
scenarios. By comparing the columns of table 3.1 corresponding to the posterior regrets of each 
plug-in estimator, we can observe that the performance of the GR was followed by the ones of 
the CB, SSEL and MLE ensembles in order of increasing percentage regret for every condition 
where n > ICQ. The quartiles derived from the WRSEL ensemble outperformed the ones based 
on the SSEL and MLE sets of point estimates when n = 100, under both types of models and 
under all levels of the RLS. The WRSEL quartile estimator also surpassed the performance of 
the CB ensemble of point estimates when RLS = 100 and n = 100. However, the performance of 
the WRSEL estimator was very sensitive to the size of the parameter ensemble, and deteriorated 
rapidly as n increased, under both types of models. Figure 3.1 on page 46 illustrates the impact 
of an increase of the size of the parameter ensemble on the behaviour of the WRSEL ensemble of 
point estimates. In panel (b), one can observe that the WRSEL ensemble distribution becomes 
increasingly bimodal as n increases. This particular behaviour can be explained in terms of the 
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(a) SSEL 



(b) WRSEL 




Figure 3.1. Effect of the change in the size of the parameter ensemble on the performance of the WRSEL 
point estimates. Panel (b) shows a histogram of the ensemble of WRSEL point estimates for n = 1000 
with superimposed lines representing the same set of point estimates under n — 100, 200 and 1000. We 
have also provided the SSEL parameter ensemble in panel (a) for comparison. Results are here shown for 
a randomly chosen data set among the 100 replicates. The superimposed curves were produced using a 
Gaussian kernel over 512 equal-sized bins. 



specification of the (pi's in the WRSEL estimation procedure. Each of the (pi's is a function of 
both the parameters oi and 02 and the size of the ensemble distribution, n. We here recall the 
definition of these WRSEL weights, 

<Pi := exp |ai (^i - | + cxp j-aa (^i - | : (3-27) 

for i = 1, . . . ,n, and where for the present set of non-spatial simulations, we chose ai — a2 — 0.05. 
It can therefore be seen that as n increases, the range of the (jji's tends to increase at an exponential 
rate. Since these weights control the counterbalancing of hierarchical shrinkage in the ensemble 
distribution, it follows that an increase in the size of the ensemble yields an accentuation of such 
countershrinkage. As a result, we obtain the phenomenon visible in panel (b) of figure 3.1, where 
the units in the tails of the WRSEL ensemble distribution are pushed away from the center of the 
distribution, thus creating a bimodal distribution. This effect explains why the performance of 
the WRSEL tends to rapidly deteriorate as n increases. 

The relative performance -that is, in terms of percentage regrets- of all plug-in estimators, 
except perhaps for the GR quartile estimator, appeared to worsen under the compound Gamma 
model, in comparison to the compound Normal model. The percentage regrets of most plug-in 
estimators, indicated in parentheses in table 3.1, can be observed to be higher under the compound 
Gamma model. The sole exception to that trend was the performance of the GR plug-in quartiles 
for which no systematic trend could be identified. Overall, however, that estimator also tended to 
do slightly worse under the G-IG model, although its performance was still very close to optimal. 
A comparison of the shape of the ensemble distributions of the different point estimates studied 
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in this simulation experiment is reported in figure 3.2 on page 48. It can be observed that the 
empirical distribution of point estimates remains centered at zero under the N-N model, thereby 
matching the central tendency of the true ensemble distribution. Under the G-IG model, however, 
the skewness of the true ensemble distribution made it more difficult for plug-in estimators to 
match the values of the quantiles of the true parameter ensemble. 

Increasing the magnitude of the RLS was found to detrimentally affect the performance of the 
MLE, SSEL and CB plug-in estimators. The percentage regrets for these three families of plug-in 
estimators was found to increase systematically with the magnitude of the RLS. This effect may 
be explained in terms of the added variability associated with a high RLS. For the MLEs, this 
difference in percentage regrets is directly attributable to differences in sampling variance. For the 
SSEL and CB plug-in estimators, one may explain the detrimental effect of the RLS experimental 
factor in terms of hierarchical shrinkage. That is, units with very high sampling variance are 
subject to a larger degree of hierarchical shrinkage than cnscmbk; units with low sampling variance. 
These changes may then modify the ordering of the units and the overall shape of the ensemble 
of SSEL and CB point estimates, thereby leading to poorer performance than under low RLS. No 
similar trend could be observed for the GR and WRSEL quartile estimators. The GR quartile 
estimators seemed to be robust to changes in RLS levels, although the scale of the differences in 
performance was too small to reach any definite conclusions. For the WRSEL plug-in estimator, 
an increase in RLS was found to decrease percentage regret when considering the N-N model, but 
not necessarily under the G-IG model. 

Finally, increasing the size of the parameter ensemble resulted in a systematic increase of per- 
centage regret for all plug- in estimators, except for the quartilcs based on the triple- goal function. 
As noted earlier, this relative deterioration of the performance of the plug-in estimators was par- 
ticularly acute for the WRSEL estimators. The absolute value of the posterior loss under the 
optimal Q-SEL estimator, however, tended to decrease as n increased. Thus, while the posterior 
empirical quartiles are better estimated when the size of the parameter ensemble of interest is 
large, the relative performance of the plug-in estimators tends to decrease when their performance 
is quantified in terms of percentage regret. One should note, however, that for ensembles of point 
estimates, which optimised some aspects of the ensemble empirical distribution, the absolute pos- 
terior regrets (cf. first column of table 3.1) tended to diminish with n, even if this was not true for 
the corresponding percentage posterior regrets (cf. values in parentheses in the remaining columns 
3.1). This trend was especially noticeable for the CB and GR plug-in estimators, whose absolute 
posterior regrets can be seen to be lower under larger parameter ensembles. 

3.3.6 Plug-in Estimators under QR-SEL 

The simulation results for the QR-SEL function are reported in table 3.2, on page 49. For the 
compound Gaussian model, we computed the plug-in estimators under the IQR-SEL function. For 
this loss, the optimal estimator is the posterior IQR, and the RoPQ is replaced by the DoPQ, as 
described in equation (3.18). We considered the effect of the different experimental factors on the 
performance of the plug-in estimators, in turn. 

Overall, the ordering of the different plug-in estimators in terms of percentage regret under 
QR-SEL was found to be identical to the ordering observed under the Q-SEL function. The 
empirical QR of the MLEs tended to exhibit the largest percentage regret across all conditions 
and scenarios, except under the N-N model with n = 1000, for which the WRSEL plug-in estimator 
performed worse. As was previously noticed imder the Q-SEL, the WRSEL plug-in estimator of 
the empirical QR was found to be very sensitive to the size of the parameter ensemble. That is, 
the performance of the WRSEL plug-in estimator severely deteriorated as n grew large. We can 
explain this effect in terms of the modification of the (/)i's in the WRSEL function following an 
increase in n. Ignoring the WRSEL-based estimator, the SSEL plug-in estimator exhibited the 
second worst percentage regret after the MLE-based QR. This was followed by the CB plug-in 
estimator. The triple-goal and RoPQ were found to almost match the performance of the optimal 
estimator under the QR-SEL function over all simulated scenarios. No systematic difference in 
percentage regret could be identified between the GR and the RoPQ plug-in estimators. 
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(b) G-IG 
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Figure 3.2. Histograms of a randomly chosen simulated parameter ensemble under the compound 
Gaussian and compound Gamma models in panel (a) and (b), respectively, for n = 100 and RLS = 1. For 
each model, ensemble distributions of point estimates have been superimposed for five different estimation 
schemes. 



As for the Q-SEL function, increasing the level of heterogeneity of the sampling variances 
had a detrimental effect on the percentage regret exhibited by the MLE, SSEL and CB plug-in 
estimators. The performance of these three families of QR estimators systematically decreased 
as RLS increased (cf. table 3.2 on page 49). The plug- in estimator under the WRSEL function, 
by contrast, tended to behave in the opposite direction, whereby an increase in RLS yielded an 
increase in performance, albeit this increase was restricted to the N-N model. Both the GR and 
DoPQ/RoPQ plug-in estimators appeared to be robust to changes in RLS levels, although the GR 
estimator of the QR exhibited a substantial increase in percentage regret for very large parameter 
ensembles -i.e. for RLS ^ 100 and n — 1000, where the percentage regret for using the GR 
reached 24% of the posterior loss under the optimal estimator. 

The choice of statistical model had a strong effect on the performance of the MLE, SSEL and CB 
plug-in estimators. Consider the values of the percentage regrets in parentheses in columns three, 
five and nine of table 3.2 on page 49. For these three families of ensembles of point estimates, the 
use of a compound Gamma model yielded a larger percentage regret. For the WRSEL estimator 
of the QR, no systematic trend could be identified, as the performance of that estimator under 
the different statistical models seemed to be also dependent on the choice of RLS levels and the 
number of units in the parameter ensemble. In addition, the behaviour of the triple-goal plug-in 
estimator appeared to deteriorate under the G-IG model. The DoPQ/RoPQ plug-in estimators, 
by contrast, exhibited slightly lower percentage regret under the compound Gamma model. 

Increasing the size of the parameter ensemble tended to produce worse plug-in estimators with 
higher posterior percentage regrets. This trend can be observed for the MLE, SSEL, WRSEL 
and CB plug-in estimators. The triple-goal and DoPQ/RoPQ-based QR estimators, by contrast, 
appeared to be robust to an increase in the size of the parameter ensemble. No systematic effect 
of the size of n could be noticed for these two families of estimators. In summary, the GR and 
DoPQ/RoPQ plug- in estimators exhibited the best performance under both the Q-SEL and QR- 
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Table 3.2. Posterior regrets based on QR-SEL(0, QR(0^ )), for five plug-in estimators, for the compound 
Gaussian model in equation (2.20) and the compound Gamma model in equation (2.21), for 3 different 
levels of RLS, and 3 different values of n, averaged over 100 replicate data sets. The posterior expected loss 
of the optimal estimator is given in the first column. Entries were scaled by a factor of 10^. In parentheses, 
the posterior regrets are expressed as percentage of the posterior loss under the optimal estimator. 



Scenarios Posterior regrets'^ 



QR-SEL MLE SSEL WRSEL CB GR RoPQ 



RLS = 1 
















N-N, n = 100 


21 


329 (1565) 


151 (717) 


39 (184) 


9 (45) 


(1) 


0(2) 


N-N, n = 200 


11 


325 (3001) 


158 (1455) 


705 (6503) 


4 (38) 


(0) 


0(1) 


N-N, n = 1000 


2 


313 (14063) 


160 (7198) 


6258(281565) 


2 (76) 


(1) 


0(0) 


G-IG, n = 100 


27 


18641 (69283) 


467 (1736) 


160 (593) 


61 (226) 


(2) 


0(0) 


G-IG, n = 200 


14 


12501 (92484) 


488 (3608) 


16 (116) 


44 (328) 


(2) 


0(0) 


G-IG, n = 1000 


3 


12692 (471628) 


486 (18078) 


2098 (77969) 


50 (1871) 


(4) 


0(0) 


RLS = 20 
















N-N, n = 100 


21 


448 (2139) 


181 (866) 


20 (97) 


10 (48) 


(1) 


0(2) 


N-N, n = 200 


10 


436 (4237) 


206 (1998) 


478 (4643) 


9 (91) 


(0) 


0(1) 


N-N, n = 1000 


2 


412 (19081) 


195 (9012) 


5016(232022) 


6 (278) 


(1) 


0(0) 


G-IG, n = 100 


27 


22062 (83039) 


572 (2154) 


296 (1113) 


105 (397) 


(1) 


0(0) 


G-IG, n = 200 


13 


19019 (142416) 


565 (4228) 


63 (475) 


153(1148) 


(1) 


0(0) 


G-IG, n = 1000 


3 


17738 (675792) 


566(21555) 


850 (32365) 


176(6721) 


0(11) 


0(0) 


RLS = 100 
















N-N, n = 100 


20 


522 (2660) 


213 (1087) 


7 (34) 


18 (93) 


(0) 


0(2) 


N-N, n = 200 


10 


463 (4578) 


270 (2664) 


259 (2555) 


32 (320) 


(0) 


0(1) 


N-N, n = 1000 


2 


500 (24301) 


270(13108) 


3804(184974) 


32(1542) 


(1) 


0(0) 


G-IG, n = 100 


26 


65739 (256477) 


563 (2197) 


301 (1176) 


148 (579) 


1 (3) 


0(0) 


G-IG, n = 200 


13 


36096 (284722) 


572 (4510) 


97 (763) 


147(1160) 


(1) 


0(0) 


G-IG, n = 1000 


3 


29789(1140841) 


586 (22424) 


569 (21800) 


230 (8806) 


1(24) 


0(0) 



^ Entries for both the posterior and percentage regrets have been truncated to the closest integer. For some entries, 
percentage regrets are smaller than 1 percentage point. 



SEL functions, over all the scenarios considered. We now present a set of spatial simulations to 
evaluate the behaviour of the different families of plug-in estimators of interest under more realistic 
modelling assumptions. 

3.4 Spatial Simulations 

In this simulation study, four experimental factors were of interest. Firstly, we manipulated the 
spatial structure of the true parameter ensemble. Secondly, we considered the effect of different 
modelling assumptions. Thirdly, we modified the level of heterogeneity of the parameter ensemble, 
and finally, we assessed the influence of changing the overall level of the expected counts, in each 
scenario. 

3.4.1 Design 

The general framework for data generation followed the one implemented by Richardson et al. 
(2004). The data were generated using a subset of expected counts from the Thames Cancer 
Registry (TCR) for lung cancer in West Sussex, which contains n = 166 wards. The expected 
counts, denoted Ei's for i = 1, . . . ,n, were adjusted for age only, and correspond to the number 
of cases of lung cancer occurring among males during the period 1989-2003. The overall level of 
expected counts for lung cancer over the entire region varied from min(ii^) = 7.97 to raax{E) = 
114.35 with a mean of 42.44. The level of the expected counts was also made to vary across the 
simulations, in order to assess its influence on the performances of the different plug-in estimators. 
Specifically, we will study the; effect of three different levels of expected counts over the performance 
of the plug- in estimators of interest. 

Four different types of spatial patterns were simulated. The main objective of these simulations 
was to create a set of varying levels of difficulty for a smoothing technique, such as the CAR prior. 
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Figure 3.3. True RRs based on expected counts (SF = 1.0) for lung cancer among males between 1989 
and 2003, for 166 wards located in West Sussex, UK. The values of the RRs were generated using the 
protocols described in section 3.4.2, and are respectively denoted by SCI (one isolated cluster), SC2 (five 
isolated clusters and five isolated areas), SC3 (spatial pattern generated using the Matern function), SC4 
(spatial pattern using a hidden covariate). RRs in all four scenarios were here produced with a medium 
level of variability. 

The scenarios were thus constructed in order to include a scenario with only a single isolated area 
of elevated risk (simulation SCI, the hardest scenario for a smoothing procedure) and a situation 
containing a relatively smooth risk surface with clear spatial structure (simulation SC3). The 
spatial simulations were constructed either by randomly selecting a subset of the areas in the 
region and labelling these areas as elevated-risk areas (SCI and SC2), or by directly modelling 
the generation of the RRs ascribed to each area (SC3 and SC4). The construction of the spatial 
scenarios occurred only once, and the overall level of the expected counts was kept constant 
throughout the four spatial scenarios. 

3.4.2 Spatial Structure Scenarios 

Four different spatial structures were constructed, which respectively correspond to a situation 
with one isolated foyer of elevated risk (SCI), five isolated foyers and five isolated areas of elevated 
risk (SC2), a spatially distributed risk pattern (SC3), and the effect of a spatially structured risk 
factor (SC4). For each scenario, we describe the generation of the true RRs. In each case, three 
different levels of risk variability across the region was generated by manipulating scenario-specific 
parameters. Scenarios SCI and SC2 replicate the first two spatial simulations considered by 
Richardson et al. (2004). 

Simulation 1 (SCI). In the first scenario, a single isolated cluster of areas with elevated risk 
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was generated by choosing an area randomly, and selecting its first-degree neighbours. The 
cluster was chosen such that the total sum of the expected counts of the areas contained in 
the cluster was approximately equal to 5% of the sum of the expected counts over the entire 
region. The remaining areas in the region will be referred to as background areas. The set 
of indices of the wards in the cluster will be denoted by £, whereas the background areas 
will be denoted hy B := {i — \, . . . : i ^ £}. The level of risk in each individual ward was 
then fixed using the following rule, 

e.:^\^ (3.28) 

where LR stands for level of risk. Three different levels of elevated risk were chosen with 
LR = {1.5, 2.0, 3.0}, thereby creating different levels of RR variability in the simulated data 
sets. 

Simulation 2 (SC2). In the second scenario, we created a situation of mixed heterogeneity by 
combining both the simulation of clustered foyers of risks with the presence of isolated single 
areas with elevated risks. Firstly, five single areas were categorised as elevated risk areas. 
These areas corresponded to the 10**^, 20'^, 50**^, 75*^ and 90"^ percentiles of the empirical 
distribution of the expected counts. The indices of these five wards will be denoted by /. 
In addition, we chose five non-contiguous clusters, which did not comprise the individual 
areas of elevated risk. These foyers, denoted C^ 's with j — 1, . . . , 5, were constructed using 
the method described for the first scenario, ensuring that the cumulated expected counts in 
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Figure 3.5. Simulated SMRs based on expected counts (SF — 1.0) for lung cancer among males between 
1989 and 2003, for 166 wards located in West Sussex, UK. The values of the SMRs were generated following 
the protocols described in section 3.4.2, with a medium level of variability in the RRs. 



each cluster did not exceed 5% of the total sum of expected counts in the region. Variability 
in levels of risk was here controlled by varying LR. As in the first scenario, for every i, we 
defined the 0i's according to the rule described in equation (3.28), except that the set of 
indices of the elevated risk areas was here defined as 



(3.29) 



Note that some buffering was required in simulations SCI and SC2 in order to ensure that 
the randomly selected areas and clusters, respectively, were not adjacent or overlapping with 
each other. This was done by selecting a buffer of background areas around each of the 
cluster in SCI, and around each cluster and each of the individual areas in SC2. Note that 
in the latter case, the neighbours of the neighbours of the regions included in any one cluster 
were also excluded from further random selection, in order to ensure non-adjacent clusters. 

Simulation 3 (SC3). In a third scenario, a spatially structured risk surface was generated using 
the Matern function. Spatial structure was specified through the variance/covariance matrix, 
S, of the log^i's. We computed the following symmetric matrix of intercentroidal distances, 
denoted D, whose entries take the following values, 

Dy = \\xi - Xjlla, (3.30) 

for every pair of indices (« , j ) , with 1 1 • 1 1 2 denoting the L2-norm and where each Xj represents a 
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set of 2-dimensional coordinates locating the centroid of the i**^ region in . By convention, 
Da := for every i. We then computed the mean intercentroidal distance between all pairs 
of neighbours, defined as follows, 

1 " 

5^A,X{i~i}, (3.31) 

i<j 

where Ng is the total number of neighbouring relationships in the entire region. We used 
w to select a particular level of risk decay, as a function of the intercentroidal distance. 
Our objective was here to obtain high to moderate correlations when the intercentroidal 
distance approached iv. Spatial autocorrelations were then made to drop off to a low level 
once the distances were equal to twice the value of the mean intercentroidal distance. We 
used the Matern function to specify this form of spatial structure in the entries of the 
variance/covariance matrix, such that 

^ij = 2^-ir(^) i^^DijcprK42,/I^Dijcl>), (3.32) 

choosing (p — 3000 and ~ 40, and where K[y{-) denotes the modified Besscl function of the 
third kind of order v. The resulting matrix S was then used to draw sets of realisations 
from a multivariate normal distribution, 

log(0)~MVN(O,a2c3S), (3.33) 

with parameter CgQg controlling for the overall marginal variance of the spatial structure. 
Three different values of ctscs among {0.1, 0.2, 0.3} were selected in order to vary the amount 
of variability of the RRs. 

Simulation 4 (SC4). The last scenario produced a risk surface characterised by a high hetero- 
geneity, which was generated using a hidden covariate. We used the Carstairs' deprivation 
index (GDI) for each area in the region of interest in order to create spatial variation in levels 
of risk (Carstairs and Morris, 1989a, b, Carstairs, 2000). The logRRs were here generated 
using a linear combination of the ward-specific CDIs. Formally, we had 

log{ei) = a + pCi + Vi, (3.34) 

where Vi ~ N{0,agQi^), and Ci indicates the level of social deprivation in the i^^ ward. The 
intercept a was assumed to be null throughout the simulations. The regression coefficient, 
/3, took values in the set {0.2, 0.3, 0.4} in order to produce different levels of variability in the 
parameter ensemble, while the standard deviation, i7sc4) was fixed to 0.1 in this scenario. 
The set of values for (3 allowed the generation of scenarios with low, medium and high RR 
variability. 

The simulated observed counts in each scenario were simulated under the constraint Yl^^i Ui = 
Y^7=i (^®® Wright et al., 2003, for an example of synthetic simulations under this constraint). 
This condition was respected by generating the full vector of observed counts, y, from a multino- 
mial distribution parametrised such that the number of trials corresponded to the total number 
of expected counts -Bj, denoted Ne, and with the following vector of probability masses 

and using 

y ~ Multinomial {Ne , tt) , (3.36) 
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for each of the scenarios and under the three levels of variability. For every combination of scenario 
and variability, we produced 100 replicates, thereby totalling 1,200 different data sets. Since each 
of these data sets was fitted with two different models, the complete spatial simulations generated 
2,400 sets of joint posterior distributions of parameter ensembles. In addition, we also produced 
further simulations in order to evaluate the effect of different scaling of the expected counts. We 
employed three different values of the scaling factor (SF), such that 

SF e {0.1,1.0,2.0}. (3.37) 

That is, for each combination of the aforementioned experimental factors, we additionally varied 
the level of the expected counts 

Pois(^i^iSF), (3.38) 

for every z = 1, . . . , n, for a given vector of true ^j's, whose generation depended on the specific 
spatial scenario considered. The main simulation results in this section will be described for 
SF = 1.0. In section 3.4.6, however, we will briefly consider the consequences of scaling up or 
scaling down the expected counts for a smaller subset of simulated data, choosing SF to be either 
0.1 or 2.0. 

Typical examples of the different maps produced by this set of simulated scenarios are il- 
lustrated in Figures 3.3, 3.4, and 3.5, representing the true RRs, observed counts and standard 
mortality ratios, respectively. These maps were produced for a medium level of variability with 
LR = 2 for SCI and SC2, ct§c3 = <^SC4 = 0-2 and /3 = 0.3, for SC3 and SC4. The full details of the 
simulated data sets are given in tables B.l, B.2 and B.3, on pages 106, 106 and 107, respectively, 
in appendix B. These tables also include descriptive statistics for simulated data based on different 
scaling of the expected counts. 

3.4.3 Fitted Models 

The synthetic data sets were modelled using the two spatial BHMs described in section 2.2.2. 
These models arc characterised by the combined use of spatially strTictured and unstructured 
random effects. This particular combination of random effects was originally introduced by Besag 
ct al. (1991), and is therefore sometimes referred to as the BYM model (sec also Best et al., 
2005). The CAR Normal and CAR Laplace models were fitted using WinBUGS 1.4 (Lunn et al., 
2000) with 4,000 iterations including 2,000 burn-in. In this software, the m^'s in the CAR priors 
are constrained to sum to zero (Spiegelhalter et al., 2000). The non-stationarity of the model, 
however, is induced by specifying a flat prior on a, which is the intercept for the log ^j's in equation 
(2.27). That is, we have 

a ~ Unif (-0O, +oo). (3.39) 

This particular formulation of the CAR distribution was suggested by Besag and Kooperberg 
(1995). The CAR Normal and CAR Laplace models are fully specified once the hyperparameters 
and in equations (2.28) and (2.29), respectively, are given hyperpriors. In our case, these 
two parameters were given the following 'vague' specification, 

~ Gam(ao, 6o), t'"^ ~ Gam(oo, 6o), (3.40) 

where we chose ao := .5 and 6o := .0005, which constitutes a common specification for BHMs 
(Gelman, 2006). Note, however, that different choices of hyperparameters may yield different 
posterior distributions of the parameters of interest. The WinBUGS codes used for these two 
models is reported in appendix C. We now consider in greater detail the various aspects of a 
parameter ensemble in a BHM that may be of interest, and how estimation of these aspects may 
be optimised. 

For each simulated scenario, we computed the posterior regrets under the Q-SEL and QR-SEL 
functions, as described in section 3.2.3. In both cases, we estimated these loss functions with 
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respect to the vector of RRs, 6. For the Q-SEL function, this gave 

regret (Q-SELp,Qg,,(p)). (3.41) 

for different plug-in estimators, , and where the optimal estimator under this loss function is 
the vector of posterior empirical p-quantiles as reported in equation (3.7). Similarly, the following 
posterior regret was calculated to quantify the estimation of the QR using ensembles of sub-optimal 
point estimates, 

regret (qR-SEL, QR(^^' )) , (3.42) 

defined with respect to QR-SEL(0, 6), as described in section 3.2.3. Note that we have here chosen 
to compute both the empirical quantiles and the QR on the scale of the RR. However, we could 
have chosen to evaluate these quantities with respect to the logRRs. In section 3.6, we discuss the 
implications of such a change of parametrisation on the decision-theoretic problems of interest. 

All the results presented in the following sections are expressed as posterior expected losses, 
which are functions of the joint posterior distribution of the parameters under scrutiny. Since 
these evaluative criteria are highly dependent on the type of hierarchical models used for the 
evaluation, we have also computed the mean squared error of the QR with respect to the true QR, 
based on the simulated RRs. For the BYM model, we found that the optimal QR estimate (i.e. 
the posterior mean of the empirical QR) was on average at a 0.082 square distance from the true 
QR. That discrepancy was slightly higher for the Laplace model, at 0.087, when averaged over all 
simulated scenarios. 

3.4.4 Plug-in Estimators under Q-SEL 

The results of the spatial data simulations for the estimation of the parameter ensemble's quantiles 
arc presented in table 3.3, on page 56, for the CAR normal (denoted BYM in the table) and the 
CAR Laplace (denoted LI in the table). As for the non-spatial simulations, we reported the 
percentage posterior regrets in parentheses for each plug-in estimator. Note that for these spatial 
simulations, the parameters controlling the size of the (f>i's in the WRSEL function were given the 
following specification, ai = 02 = 0.5, which is a symmetric version of the original specification 
used by Wright et al. (2003) for similar spatial models. 

Overall, the GR plug-in estimator of the first and third quartiles outperformed the four other 
plug-in estimators across all spatial scenarios and levels of the different experimental parameters. 
The quartiles derived from the ensemble of CB point estimates tended to exhibit the second best 
performance on most scenarios, followed by the quartiles of the SSEL and WRSEL ensemble 
distributions, with the WRSEL displaying the worse performance, as can be observed in column 
six of table 3.3, on page 56. The MLE-based empirical quartiles were poor when we considered 
SC3 and SC4 as spatial scenarios. However, the MLE plug-in estimates outperformed the SSEL, 
WRSEL and CB ensemble quartiles under scenarios SCI and SC2, for simulations with medium 
to high heterogeneity of the true RRs. 

The most important experimental factor was the use of different scenarios, as can be observed 
in table 3.3 by comparing the SC1/SC2 lines with the SC3/SC4 lines of each section of the 
table. In particular, SCI and SC2 tended to produce very different outcomes from the ones 
obtained for the SC3 and SC4 synthetic data sets. One should note that the SCI and SC2 
spatially structured scenarios are somewhat artificial because the true distributions of the level 
of risks in these scenarios is discrete: each RR can only take one of two values. This produced 
some particularly counter-intuitive results when comparing the performances of different plug-in 
estimators. 

In figure 3.6 on page 57, the typical ensemble distributions for different choices of point es- 
timates are reported under the four spatial scenarios. Despite the discrete nature of the true 
distributions in scenarios SCI and SC2, all ensembles of point estimators behaved similarly in 
terms of level of shrinkage. That is, the WRSEL and SEL functions exerted a moderate to high 
level of shrinkage towards the prior mean, whereas the CB and GR point estimators resulted in an 
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Table 3.3. Posterior regrets based on Q-SELp(0, QgL' (p)) with p := {.25, .75}, for five plug-in estimators, 
and with the posterior expected loss of the optimal estimator in the first column. Results are presented 
for three different levels of variability, with SF = 1.0 and for four spatial scenarios: an isolated cluster 
(SCI), a set of isolated clusters and isolated areas (SC2), highly structured spatial heterogeneity (SC3), 
and a risk surface generated by a hidden covariate (SC4). Entries, averaged over 100 replicate data sets in 
each condition, arc scaled by a factor of 10^ with posterior regrets expressed as percentage of the posterior 
loss under the optimal estimator in parentheses. 
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99.3 (12136) 
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0.1 (8) 
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0.6 


1.6 
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3.3 
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1.3(195) 


0.1(10) 


BYM-SC2 


1.3 


2.7 
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235.5 (17548) 
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(65) 
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^ Entries for the posterior regrets have been here truncated to the closest first digit after the decimal point, 
whereas entries for the percentage regrets have been truncated to the closest integer. 



ensemble distribution eloser to the true ensemble of the RRs. The MLEs, by contrast, tended to be 
over-dispersed. In effect, the ordering of the different families of point estimators in terms of level 
of shrinkage was maintained. Therefore, the estimators retained their typical properties. However, 
it is these properties that have changed in their level of desirability. While the over-dispersion 
of the MLEs is generally a disadvantage in terms of the estimation of the quantiles of the true 
ensemble distribution; for SCI and SC2, it has now become one of the desirable properties of a 
point estimator because the true RRs arc artificially over-dispersed in these two scenarios. These 
observations account for the superiority of the quartiles based on the ensemble distribution of the 
MLEs in comparison to the SSEL, WRSEL and CB plug-in estimators for these scenarios. 

The level of variability of the true ensemble of RRs was also found to have a substantial effect 
on the ordering of the different plug-in estimators in terms of percentage regret, although this effect 
was mediated by the choice of spatial scenarios. For the discrete two-category spatial scenarios - 
SCI and SC2- the quartiles of the ensemble distributions of the SSEL, WRSEL and CB estimators 
were detrimentally affected by an increase in the heterogeneity of the true parameter ensemble. 
The MLE-based empirical quartiles, by contrast, appeared to improve their performance as the 
true RRs became more dispersed, although this trend was mainly restricted to the SCI scenario. 
For SC3 and SC4, however, the effect of increasing the variability of the RRs tended to result 
in a decrease in percentage regret for most plug-in estimators, although systematic trends were 
difficult to isolate. Under SC4, the MLE, SSEL and WRSEL plug-in estimators were positively 
affected by an increase in the heterogeneity of the RR ensemble. 
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Figure 3.6. Histograms of simulated parameter ensembles, under scenarios SCI, SC2, SC3 and SC4 in 
panels (a), (b), (c) and (d), respectively for a low variability specification. Distributions of point estimate 
ensembles have been superimposed using different coloured lines, and are based on posterior distributions 
produced under the CAR Normal model. For SCf and SC2, in panels (a) and (b), the true parameter 
ensemble of the RRs is a discrete distribution on solely two values. 

The choice of the CAR prior -that is, the contrast between the BYM model with its robust 
version based on the Laplace distribution- mainly affected the SSEL, WRSEL and CB plug-in 
estimators, whose performance was worsened when using the CAR Laplace prior, under the SC3 
and SC4 simulation scenarios. We note, however, that this trend was not completely verified for 
the CB quartile estimator for which the use of the BYM model produced worse percentage regret 
under low variability. The GR plug-in estimators, by contrast, appeared to be only marginally 
affected by the choice of prior. The use of the Laplace prior slightly increased the posterior 
expected loss (cf. first column of table 3.3, on page 56) associated with the optimal estimator 
under the Q-SEL function, thereby indicating that the use of such a model leads to estimators of 
empirical quartiles, which are inferior to the ones obtained when specifying a CAR Normal prior. 

3.4.5 Plug-in Estimators under QR-SEL 

The posterior regrets associated with the QR-SEL function for different plug-in estimators and 
different levels of experimental factors are presented in table 3.4, on page 58. The results for the 
QR-SEL function were found to follow the pattern described for the Q-SEL function. As in the 
preceding section, the most important experimental factor was the choice of simulation scenario. 
Although the over-dispersion of the MLEs was found to be disadvantageous when estimating the 
QR under SCI and SC2 with a low level of variability in the true RRs, this property became 
advantageous as the variability of the RRs increased. The converse was true for the QRs of the 
SSEL, WRSEL and CB ensemble distributions. That is, these three families of plug-in estimators 
produced QRs associated with greater percentage regret under SCI and SC2, as the level of 
variability increased. As for the Q-SEL, the triple-goal plug-in estimator was found to outperform 
all the other ensembles of point estimates across all scenarios and experimental conditions. For 
the QR-SEL, we also evaluated the posterior regret associated with the use of the RoPQ. This 
particular choice of estimator yielded a quasi-optimal performance with percentage regrets lower 
than one percentage point under all scenarios considered. 
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Table 3.4. Posterior regrets based on QR-SEL(0, QR(0^ )) for six plug-in estimators, and with the 
posterior expected loss of the optimal estimator in the first column. Results are presented for three 
different levels of variability, with SF = 1.0 and for four spatial scenarios: an isolated cluster (SCI), a 
set of isolated clusters and isolated areas (SC2), highly structured spatial heterogeneity (SC3), and a risk 
surface generated by a hidden covariate (SC4). Entries, averaged over 100 replicate data sets in each 
condition, are scaled by a factor of 10^ with posterior regrets expressed as percentage of the posterior loss 
under the optimal estimator in parentheses. 
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(341) 


8 (132) 


5694 (93879) 


4 (65) 





(2) 


0(0) 


L1-SC4 


3 


17 


(574) 


8 (252) 


2569 (86403) 


1 (47) 





(1) 


0(0) 


High Variab. 




















BYM-SCl 


1 


1 


(99) 


13 (1409) 


149 (16321) 


10(1043) 





(2) 


0(0) 


BYM-SC2 


6 


16 


(288) 


63 (1128) 


76 (1357) 


52(927) 





(1) 


0(0) 


BYM-SC3 


8 


18 


(215) 


5 (65) 


4858 (58570) 


10(122) 





(1) 


0(0) 


BYM-SC4 


5 


19 


(398) 


5 (108) 


3930 (83209) 


2 (34) 





(1) 


0(0) 


Ll-SCl 


1 


2 


(172) 


13 (1211) 


175 (16350) 


9(873) 





(1) 


0(0) 


L1-SC2 


6 


24 


(379) 


67 (1077) 


103 (1658) 


54(870) 





(1) 


0(0) 


L1-SC3 


9 


26 


(280) 


7 (77) 


7609 (82689) 


6 (70) 





(1) 


0(0) 


L1-SC4 


5 


17 


(354) 


8 (155) 


4772 (97158) 


2 (40) 





(1) 


0(0) 



Entries for both posterior and percentage regrets have been here truncated to the closest integer. For some 
entries, percentage regrets are smaller than 1 percentage point. 



As with quantile estimation, the use of the CAR Laplace prior resulted in higher posterior 
expected losses when considering the optimal estimator of the QR, and also tended to produce 
substantially larger percentage posterior regrets for the SSEL, WRSEL and CB plug-in estimators. 
In summary, this spatial simulation study has shown that the different plug-in estimators of interest 
tend to behave similarly under both the Q-SEL and QR-SEL functions. As for the non-spatial 
study, the triple-goal estimators were found to exhibit the lowest amount of percentage regret 
across all the simulation scenarios considered. 

3.4.6 Consequences of Scaling the Expected Counts 

Simulation results for two different scalings of the expected counts arc reported in table 3.5 on 
page 59 for the Q-SEL function and in table 3.6 on page 60 for the QR-SEL function. As in the 
previous sections, percentage regrets with respect to the posterior loss under the optimal estimator 
is reported in parentheses in the two tables. 

For both the Q-SEL and QR-SEL functions, the use of smaller expected counts resulted in 
a substantial increase of the posterior losses associated with the use of the optimal estimators 
under all simulated scenarios. Similarly, the absolute posterior regrets associated with the use of 
the MLE, SSEL, WRSEL and CB plug-in estimators tended to be larger when applying a scaling 
factor of 0.1, in comparison to the use of a scaling factor of SF = 2.0. In terms of percentage 
regret, however, this trend was reversed for the GR plug-in estimators under both the Q-SEL and 
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Table 3.5. Posterior regrets based on Q-SELp(0, Qql' (p)) with p := {.25, .75}, for five plug-in estimators, 
and with the posterior expected loss of the optimal estimator reported in the first column. Results are 
presented for three different levels of variability, and for four spatial scenarios: an isolated cluster (SCI), 
a set of isolated clusters and isolated areas (SC2), highly structured spatial heterogeneity (SC3), and a 
risk surface generated by a hidden covariate (SC4). Entries, averaged over 100 replicate data sets in each 
condition, are scaled by a factor of 10^ with posterior regrets expressed as percentage of the posterior loss 
under the optimal estimator in parentheses. 



Scenarios Posterior regrets'^ 

Q-SEL MLE SSEL WRSEL CB GR 

SF = 0.1 Low Variab. 



BYM-SCl 


4.2 


216.6 


(5163) 


2.8 


(66) 


1.9 


(46) 


0.3 (7) 


0.2 (4) 


BYM-SC2 


4.1 


231.5 


(5614) 


3.2 


(78) 


2.5 


(60) 


0.3 (8) 


0.1 (1) 


BYM-SC3 


6.3 


138.8 


(2197) 


3.0 


/ A'7\ 

(47) 


3.0 


f A'7\ 

(47) 


1.7 (27) 


0.1 (1) 


BYM-SC4 


5.5 


211.6 


(3827) 


6.0 


(109) 


4.9 


(88) 


1.2 (21) 


0.1 (2) 


Ll-SCl 


4.3 


222.3 


(5136) 


2.7 


/CI \ 

(61) 


2.0 


(45) 


0.3 (6) 


0.2 (4) 


L1-SC2 


4.1 


243.9 


(5912) 


2.9 


(70) 


2.3 


(55) 


0.3 (7) 


0.2 (5) 


L1-SC3 


8.6 


200.1 


(2335) 


7.6 


(89) 


6.7 


(78) 


0.9 (11) 


0.3 (4) 


L1-SC4 


6.0 


235.7 


(3900) 


7.3 


(121) 


6.1 


(101) 


0.5 (8) 


0.2 (3) 


SF = 0.1 Med. Variab. 


















BYM-SCl 


5.7 


189.2 


(3332) 


6.4 


(114) 


4.7 


(82) 


1.7 (30) 


0.1 (1) 


BYM-SC2 


8.6 


166.6 


(1941) 


13.9 


(162) 


12.3 


(144) 


1.5 (18) 


0.3 (3) 


BYM-SC3 


7.4 


92.9 


(1261) 


4.5 


/CM 

(61) 


3.5 


(48) 


3.2 (43) 


0.1 (1) 


BYM-SC4 


7.4 


155.8 


(2115) 


14.0 


(190) 


10.1 


(137) 


2.8 (38) 


0.1 (2) 


Ll-SCl 


4.8 


231.2 


(4810) 


4.9 


(102) 


3.9 


(80) 


0.5 (9) 


0.2 (4) 


L1-SC2 


6.7 


171.0 


(2559) 


18.7 


(280) 


15.3 


(229) 


1.1 (17) 


0.1 (2) 


L1-SC3 


7.3 


154.7 


(2115) 


23.0 


(314) 


20.1 


(275) 


3.5 (48) 


0.1 (2) 


L1-SC4 


8.0 


175.4 


(2184) 


17.1 


(213) 


14.2 


(177) 


1.5 (19) 


0.3 (4) 


SF = 0.1 High Variab. 


















BYM-SCl 


6.1 


103.9 


(1706) 


17.7 


(291) 


14.5 


(238) 


6.4(104) 


0.1 (1) 


BYM-SC2 


8.3 


78.2 


(939) 


26.0 


(312) 


24.0 


(288) 


7.9 (95) 


0.1 (1) 


BYM-SC3 


7.5 


73.6 


(986) 


6.6 


(89) 




(98) 


a 'J ( C\C\\ 


U.l (1) 


BYM-SC4 


7.6 


103.9 


(1364) 


11.8 


(155) 


9.2 


(121) 


3.3 (43) 


0.1 (1) 


Ll-SCl 


6.7 


89.3 


(1329) 


27.7 


(413) 


23.2 


(345) 


b.U (89) 


0.1 (1) 


L1-SC2 


8.4 


75.8 


(898) 


38.9 


(462) 


39.9 


(473) 


10. i (121) 


0.1 (1) 


L1-SC3 


10.1 


109.5 


(1079) 


17.7 


(175) 


18.1 


(178) 


5.8 (58) 


0.3 (3) 


L1-SC4 


7.3 


116.2 


(1589) 


32.0 


(438) 


28.0 


(383) 


5.0 (68) 


0.1 (1) 


SF = 2 Low 


Variab. 


















BYM-SCl 


0.3 


1.6 


(540) 


1.3 


(429) 


1.0 


(344) 


0.5(159) 


0.1(25) 


BYM-SC2 


0.5 


1.3 


(260) 


2.7 


(516) 


1.6 


(302) 


1.4(279) 


0.1(13) 


BYM-SC3 


0.7 


1.3 


(191) 


0.3 


(44) 


0.2 


(26) 


0.3 (44) 


0.1 (9) 


BYM-SC4 


0.6 


1.5 


(219) 


0.7 


(112) 


0.5 


(89) 


0.2 (39) 


0.0 (8) 


Ll-SCl 


0.3 


1.6 


(461) 


1.4 


(419) 


1.0 


(307) 


0.5(143) 


0.1(16) 


L1-SC2 


0.5 


1.3 


(231) 


2.6 


(467) 


2.0 


(373) 


1.1(195) 


0.1 (9) 


L1-SC3 


0.7 


1.7 


(250) 


0.5 


(66) 


0.4 


(54) 


0.4 (56) 


0.1 (9) 


L1-SC4 


0.6 


1.5 


(237) 


0.7 


(108) 


1.0 


(152) 


0.1 (13) 


0.1(15) 


SF = 2 Med. 


Variab. 


















BYM-SCl 


0.3 


0.2 


(74) 


1.8 


(552) 


1.5 


(464) 


1.1(327) 


0.1(21) 


BYM-SC2 


1.2 


3.1 


(270) 


8.8 


(755) 


6.3 


(541) 


7.7(664) 


0.1 (5) 


BYM-SC3 


1.1 


1.5 


(129) 


0.5 


(41) 


0.3 


(25) 


0.5 (48) 


0.1 (5) 


BYM-SCl 


0.8 


1.1 


(142) 


0.3 


(38) 


0.2 


(25) 


0.3 (32) 


0.1 (7) 


Ll-SCl 


0.4 


0.7 


(197) 


1.9 


(518) 


1.6 


(445) 


1.0(274) 


0.1(26) 


L1-SC2 


1.2 


4.3 


(316) 


7.7 


(623) 


7.1 


(576) 


6.4(521) 


0.0 (4) 


L1-SC3 


1.2 


2.1 


(175) 


0.7 


(62) 


0.6 


(49) 


0.8 (69) 


0.1 (5) 


L1-SC4 


0.9 


1.2 


(136) 


0.5 


(59) 


0.5 


(61) 


0.3 (32) 


0.1 (6) 


SF = 2 High Variab. 


















BYM-SCl 


0.4 


0.3 


(73) 


2.3 


(618) 


1.9 


(489) 


1.7(449) 


0.1(19) 


BYM-SC2 


1.7 


8.7 


(505) 


12.4 


(717) 


10.7 


(621) 


12.1(701) 


0.1 (4) 


BYM-SC3 


1.3 


0.8 


(61) 


0.5 


(40) 


0.6 


(45) 


0.6 (46) 


0.1 (9) 


BYM-SC4 


1.1 


1.6 


(150) 


0.5 


(43) 


0.3 


(32) 


0.4 (33) 


0.1 (6) 


Ll-SCl 


0.7 


0.7 


(100) 


2.2 


(333) 


1.5 


(232) 


1.6(241) 


0.1(12) 


L1-SC2 


1.9 


9.2 


(485) 


12.8 


(676) 


11.5 


(610) 


12.5(660) 


0.1 (5) 


L1-SC3 


1.5 


0.8 


(56) 


0.8 


(56) 


0.6 


(40) 


0.7 (47) 


0.2(12) 


L1-SC4 


1.1 


1.7 


(149) 


0.3 


(31) 


0.5 


(44) 


0.4 (32) 


0.1 (5) 



Entries for the posterior regrets have been truncated to the closest first digit after tiie decimal point, and 
entries for the percentage regrets have been truncated to the closest integer. 
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Table 3.6. Posterior regrets based on QR-SEL(0, QR(0^ )) for six plug-in estimators, and with the 
posterior expected loss of the optimal estimator in the first column. Results are presented for three different 
levels of variability, and for four spatial scenarios: an isolated cluster (SCI), a set of isolated clusters and 
isolated areas (SC2), highly structured spatial heterogeneity (SC3), and a risk surface generated by a 
hidden covariate (SC4). Entries, averaged over 100 replicate data sets in each condition, are scaled by 
a factor of 10^ with posterior regrets expressed as percentage of the posterior loss under the optimal 
estimator in jjarontlicscs. 



Scenarios Posterior regrets"' 





QR-SEL 


MLE 


SSEL 


WRSEL 


CB 


GR 


RoPQ 


SF = 0.1 Low Variab. 






















BYM-SCl 


3 


945 (28427) 


7 


(202) 


5 


(138) 


(14) 


(9) 





(0) 


BYM-SC2 


3 


992 (30596) 


8 


(253) 


6 


(196) 


1 (23) 


(4) 





(0) 


BYM-SC3 


11 


783 


(6819) 


10 


(83) 


9 


(74) 


4 (38) 


(1) 





(0) 


BYM-SC4 


6 


1005 (15697) 


17 


(264) 


14 


(216) 


1 (17) 


(3) 





(0) 


Ll-SCl 


4 


964 (25898) 


6 


(175) 


5 


(129) 


(11) 


0(10) 





(0) 


L1-SC2 


3 


1037 (29816) 


7 


(198) 


5 


(156) 


(13) 


0(11) 





(0) 


L1-SC3 


17 


1047 


(6098) 


25 


(143) 


20 


(119) 


2 (11) 


1 (8) 





(0) 


L1-SC4 


9 


1070 (12516) 


21 


(243) 


17 


(201) 


1 (10) 


1 (6) 





(0) 


SF = 0.1 Med. Variab. 






















BYM-SCl 


7 


869 (12769) 


17 


(254) 


13 


(187) 


3 (42) 


(1) 





(0) 


BYM-SC2 


18 


957 


(5237) 


51 


(279) 


44 


(243) 


2 (13) 


1 (7) 





(0) 


BYM-SC3 


18 


584 


(3219) 


18 


(102) 


15 


(84) 


10 (57) 


(1) 





(0) 


BYM-SC4 


13 


855 


(6678) 


46 


(361) 


34 


(265) 


5 (42) 


(2) 





(0) 


Ll-SCl 


5 


988 (18505) 


13 


(251) 


11 


(197) 


1 (19) 


(8) 





(0) 


L1-SC2 


13 


957 


(7170) 


70 


(524) 


56 


(416) 


3 (21) 


(3) 





(0) 


L1-SC3 


16 


832 


(5207) 


97 


(605) 


79 


(493) 


13 (83) 


(3) 





(0) 


L1-SC4 


16 


894 


(5756) 


58 


(375) 


46 


(299) 


4 (23) 


1 (8) 





(0) 


SF = 0.1 High Variab. 






















BYM-SCl 


9 


564 


(6583) 


60 


(697) 


49 


(575) 


18(209) 


(2) 





(0) 


BYM-SC2 


25 


629 


(2545) 


144 


(581) 


141 


(571) 


20 (83) 


(1) 





(0) 


BYM-SC3 


23 


528 


(2276) 


26 


(111) 


30 


(128) 


33(144) 


(1) 





(0) 


BYM-SC4 


17 


736 


(4381) 


52 


(312) 


41 


(245) 


2 (9) 


(1) 





(0) 


Ll-SCl 


12 


455 


(3806) 


101 


(844) 


82 


(689) 


17(144) 


(2) 





(0) 


L1-SC2 


27 


603 


(2203) 


220 


(802) 


221 


(808) 


34(125) 


1 (2) 





(0) 


L1-SC3 


30 


748 


(2533) 


86 


(291) 


84 


(285) 


18 (62) 


1 (5) 





(1) 


L1-SC4 


16 


713 


(4374) 


144 


(882) 


123 


(752) 


15 (94) 


(1) 





(0) 


SF = 2 Low 


Variab. 






















BYM-SCl 





4 


(1541) 


3 (1158) 


2 


(936) 


1(363) 


0(52) 





(0) 


BYM-SC2 


1 


3 


(373) 


7 


(889) 


4 


(566) 


2(303) 


0(27) 





(0) 


BYM-SC3 


1 


4 


(313) 


1 


(50) 





(32) 


1 (50) 


(6) 





(0) 


BYM-SC4 


1 


6 


(519) 


2 


(180) 


2 


(145) 


(20) 


0(10) 





(0) 


Ll-SCl 





4 


(1135) 


3 (1017) 


3 


(751) 


1 (296) 


0(24) 





(0) 


L1-SC2 


1 


3 


(318) 


7 


(872) 


6 


(722) 


2(252) 


0(14) 





(0) 


L1-SC3 


1 


6 


(432) 


2 


(103) 


1 


(81) 


1 (86) 


0(11) 





(0) 


L1-SC4 


1 


5 


(446) 


2 


(212) 


3 


(280) 


(12) 


0(20) 





(0) 


SF = 2 Med. 


Variab. 






















BYM-SCl 





1 


(176) 


5 (1505) 


4 (1280) 


3(832) 


0(35) 





(0) 


BYM-SC2 


2 


4 


(190) 


23 (1005) 


19 


(848) 


16(717) 


(6) 





(0) 


BYM-SC3 


4 


5 


(127) 


2 


(44) 


1 


(23) 


2 (61) 


(5) 





(0) 


BYM-SC4 


2 


5 


(244) 


1 


(49) 


1 


(35) 


(26) 


(6) 





(0) 


Ll-SCl 





2 


(389) 


5 


(1260) 


4 (1096) 


2(613) 


0(51) 





(0) 


L1-SC2 


2 


7 


(268) 


21 


(847) 


21 


(864) 


13(552) 


(5) 





(0) 


L1-SC3 


4 


7 


(161) 


2 


(47) 


2 


(56) 


3 (66) 


(6) 





(0) 


L1-SC4 


2 


4 


(205) 


2 


(95) 


2 


(95) 


1 (30) 


(5) 





(0) 


SF = 2 High Variab. 






















BYM-SCl 








(118) 


6 


(1527) 


5 (1229) 


4(1063) 


0(41) 





(0) 


BYM-SC2 


4 


19 


(495) 


34 


(900) 


33 


(871) 


29(754) 


(6) 





(0) 


BYM-SC3 


6 


3 


(56) 


2 


(32) 


2 


(37) 


3 (49) 


1(10) 





(0) 


BYM-SC4 


3 


10 


(284) 


2 


(45) 


1 


(34) 


1 (20) 


(8) 





(0) 


Ll-SCl 


1 


1 


(123) 


6 


(520) 


4 


(368) 


4(349) 


0(10) 





(0) 


L1-SC2 


4 


20 


(468) 


36 


(834) 


36 


(835) 


30(700) 


(6) 





(0) 


L1-SC3 


7 


4 


(65) 


4 


(60) 


3 


(43) 


3 (50) 


1(10) 





(0) 


L1-SC4 


4 


9 


(251) 


1 


(37) 


1 


(32) 


1 (36) 


(4) 





(0) 



* Entries for both the posterior and percentage regrets have been truncated to the closest integer. For some 
entries, percentage regrets are smaller than 1 percentage point. 
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(a) CAR normal 



(b) CAR Laplace 




0.0 0.5 1.0 1.5 2.0 2.5 0.0 0.5 1.0 1.5 2.0 2.5 

RR estimates RR estimates 

Figure 3.7. Histograms of the ensembles of point estimates under SSEL for the schizophrenia prevalence 
data set under the CAR Normal and CAR Laplace models in panels (a) and (b), respectively. Smoothed 
empirical distributions of the five families of point estimates of interest have been superimposed. 



QR-SEL functions. For this family of estimators, the percentage regrets tended to increase as the 
SF became larger. This somewhat counter-intuitive result, however, may be explained in terms 
of the comparative decrease of the posterior losses based on the optimal estimators for both the 
Q-SEL and QR-SEL functions. That is, although the percentage regrets associated with the GR 
plug-in estimators at SF = 2.0 was larger than the ones at SF = 0.1, these percentages tended to 
correspond to smaller fractions of the corresponding posterior losses. 

3.5 Urban Distribution of Schizophrenia Cases 

In this section, we consider the use of different plug-in estimators for the estimation of the quantiles 
and QR of a parameter ensemble for a real data set, which describes the spatial distribution of 
cases of schizophrenia in an urban population. This particular data set was found to be especially 
relevant to the problem at hand, as it is characterised by very low expected counts, thereby 
inducing a high level of hierarchical shrinkage. 

3.5.1 Data Description 

We here re-analysed prevalence data on first-onset cases of schizophrenia collected from the South- 
east London study area of the Aetiology and Ethnicity in Schizophrenia and Other Psychoses (AE- 
SOP) study (Kirkbride et al., 2006). This is a large, population-based, epidemiological case-control 
study of first-episode psychoses conducted in Southeast London, Nottingham and Bristol. It was 
designed to investigate differential rates of psychoses across different cities and ethnic groups in 
the UK, based upon a comprehensive survey of all incident cases of first-episode psychoses. The 
data has been shown to be of a very high quality, and the AESOP study is the largest ever study 
of first episode psychoses conducted in the UK to date (see Kirkbride et al., 2006, for a detailed 
description of AESOP). In the analyses presented here, we solely use a subset of the AESOP data 
base, covering prevalence data from the Southeast London study centre only. 

All individuals aged 16-64 years living in the study area having had contact with mental health 
services for a first episode of any probable psychosis, non-psychotic mania or bipolar disorder 
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Figure 3.8. Maps of point estimates denoting tlie RRs for schizopiirenia in 33 borouglis of Southeast 
London. The ensembles of the MLEs, posterior means, WRSEL estimates, CB estimates, and triple-goal 
estimates are plotted in panels (a) to (e), respectively. 

were selected. Initial inclusion criteria were broad, based upon those used in the WHO 10- 
country study (Jablensky et al., 1992). Ascertainment bias was minimised by incorporating a 
wide variety of psychiatric services into the study design. Here, cases were excluded if an address 
at first presentation was not obtained. The study took place over 24 months (September 1997- 
August 1999). Subjects who passed the screen underwent the Schedules for Clinical Assessment in 
Neuropsychiatry (SCAN); a modified Personal and Psychiatric History Schedule, and a schedule 
developed to record sociodemographic data. International Classification of Diseases (10th Edition) 
(ICD-10) diagnoses were made by consensus agreement from a panel of clinicians. All cases of 
broadly defined schizophrenia, corresponding to ICD-10 F20 to F29 diagnoses were included. Full 
details of the methodology used in this study have been provided by Kirkbride et al. (2006). 

3.5.2 Models Used and Performance Evaluation 

The expected counts in this data set ranged from 2.68 to 6.86 with a mean of 4.48 and a standard 
deviation of 0.91. The observed counts ranged from to 17 with a mean of 4.48 and a standard 
deviation of 3.43. For three areas, the observed counts were null. We fitted the data with the 
spatial CAR Normal and Laplace models described in section 3.4.3, using the same specification 
for the hyperparameters as in the spatial simulation study. The BYM model considered here 
constituted one of the five different models utilised in the original analysis of this data set by 
Kirkbride et al. (2007). As before, the models were fitted using WinBUGS 1.4 (Lunn et al., 
2000). Our results are based on 50,000 MCMC samples collected after discarding 10,000 burn-in 
realisations from the joint posterior. The (pi's used in the computation of the WRSEL plug-in 
estimators were here specified using ai — a2 = 0.5, as previously done for the spatially structured 
simulations. 

We compared the values taken by the different plug-in estimators by expressing them as depar- 
ture from the value of the optimal estimator. Our strategy, here, is similar in spirit to the one used 
when computing the posterior regrets in equation (2.74), except that we were more specifically 
interested in evaluating the direction of the departure from the optimal estimate. Thus, for every 
loss function of interest L', we computed 

Qp{e^')-E[Qp{e)\y], (3.43) 

separately for five different empirical quantiles, taking p £ {.05, .25, .50, .75, .95}. In addition, we 
derived the departure of the empirical QR plug-in estimators from the posterior mean empirical 
QR. Thus, we calculated 

QR(0^')-E[QRW|y], (3.44) 

for the five ensembles of point estimates under scrutiny. The RoPQ was also derived for this data 
set and compared to the posterior mean empirical QR in a similar fashion. 
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3.5.3 Results 



In figure 3.7 on page 61, the ensemble distributions of the different families of point estimates have 
been plotted. In this figure, the histograms represent the ensembles of the posterior means under 
the CAR Normal and CAR Laplace models. The empirical distributions of other ensembles of 
point estimates have been superimposed as densities. As expected, the empirical distribution of 
the MLEs was found to be more dispersed than the ensemble distributions of the four other sets 
of point estimates. In particular, the ensemble distributions of the SSEL and WRSEL estimators 
were similar, and were found to be especially sensitive to hierarchical shrinkage. The ensemble 
distributions of the CB and GR point estimates struck a balance between the over-dispersion of the 
MLEs and the shrinkage of the SSEL and WRSEL ensembles. Overall, the CB and GR ensemble 
distributions behaved similarly, albeit the empirical distribution of the GR point estimates tended 
to be smoother. Changes in the modelling assumptions did not modify the behaviour of the 
different families of point estimates. As can be observed from panels (a) and (b) of figure 3.7, 
the ensemble distributions obtained under the CAR Normal and CAR Laplace priors did not 
markedly differ. The maps of the ensembles of point estimates evaluated in this section have also 
been reported in figure 3.8 on page 62, for the CAR Normal model. 

A summary of the analysis conducted for this data set is provided in table 3.7 on page 64, 
where we report several summary statistics for the estimation of five different empirical quantiles 
and the empirical QR of the parameter ensembles of interest. As expected, the empirical quantiles 
of the MLE point estimates were over-dispersed relative to the optimal estimators under both 
modelling assumptions: the MLE plug-in estimates of 0(.o5) and 0(.25) were found to be smaller 
than the optimal empirical quantiles, whereas the MLE plug-in estimates of 6'( 75) and 95) were 
found to be larger than the optimal empirical quantiles. By contrast, the ensemble of posterior 
means followed exactly the opposite pattern: they were under-dispersed relative to the optimal 
estimators due to hierarchical shrinkage. That is, the SSEL-based estimates of ^(.05) and 6'(.25) 
were larger than the corresponding posterior empirical quantiles and the SSEL-based estimates of 
0(.75) and ^(.95) were smaller than their posterior empirical counterparts. The plug-in empirical 
quantiles obtained under the WRSEL, CB and GR functions were more difficult to classify, but 
we note that the CB plug-in estimators appeared to perform best overall. 

Regarding the posterior QR, table 3.7 shows that the MLE-based plug-in estimator tended 
to over-estimate the posterior mean of the empirical QR, whereas most other plug-in estimators 
were likely to under-cstimatc its value. The RoPQ also tended to be smaller than the posterior 
mean of the empirical QR. Overall, the ordering of the different plug-in estimators with respect to 
their departure from the optimal QR followed the ordering reported in our non-spatial and spatial 
simulation studies. The sole exception to this parallel was the CB-based empirical QR, which 
was observed to be slightly larger than the posterior mean of the empirical QR. These results 
were found to be robust to a change in modelling assumptions, albeit the value of the posterior 
mean of the empirical QR was found to be slightly smaller when using the CAR Laplace prior. 
Most importantly, however, one can observe that the credible intervals for the posterior empirical 
quantiles and the empirical QR were found to be larger under the CAR Laplace model, thereby 
suggesting that the estimation of these quantities was noisier for this model than for the CAR 
Normal. 

In addition, in table 3.8 on page 65, we have reported the posterior regrets for the plug-in 
estimators of interest under the Q-SEL and QR-SEL. For the estimation of the empirical quantiles 
under Q-SEL, the ordering of the plug-in estimators in terms of percentage regrets reproduced the 
one found in the simulation studies. In particular, the choice of the MLEs and WRSEL plug-in 
estimators yielded the largest posterior regrets, whereas the triple-goal plug-in empirical quantiles 
were found to be quasi-optimal. For this data set, however, the SSEL plug-in empirical quantiles 
were found to be almost identical to the ones based on the CB ensemble. Under the QR-SEL 
function, the CB plug-in estimator slightly outperformed the triple-goal plug-in estimator when 
estimating the QR. This result should be contrasted with our synthetic data simulations, which 
indicated that the triple-goal plug-in QR was generally better than the CB estimator in terms of 
posterior regret. We note, however, that in the case of the schizophrenia prevalence data set, the 
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Table 3.8. Posterior regrets based on QR-SEL(0, QR(0^')), and Q-SEL(p, 0, 0fp)) with p := {.25, .75}, 
for the schizophrenia prevalence data set. The posterior expected loss of the optimal estimator is given in 
the first column. In parentheses, posterior regrets are expressed as percentage of the posterior loss under 
the optimal estimator. 



Models 








Posterior regrets" 








Post. Loss 


MLE 


SSEL 


WRSEL 


CB 


GR 


RoPQ 


Q-SEL 

CAR Normal 
CAR Laplace 


0.04 
0.04 


0.13(342) 
0.13(341) 


0.01 (24) 
0.01 (32) 


0.15(392) 
0.16(434) 


0.01 (25) 
0.01 (32) 


0.00(1) 
0.00(1) 




QR-SEL 

CAR Normal 
CAR Laplace 


0.17 
0.18 


0.86(498) 
0.89(492) 


0.10 (55) 
0.09 (47) 


0.42(245) 
0.49(273) 


0.00 (3) 
0.01 (7) 


0.01(7) 
0.01(8) 


0.00(3) 
0.00(3) 



* Entries for the posterior regrets have been truncated to the closest second digit after the decimal point. 
Entries for the percentage regrets have been truncated to the closest integer. 



small size of the parameter ensemble may liave detrimentally affected the performance of the triple- 
goal estimator, which requires large parameter ensembles for a good estimation of the EDF. Since 
the data set under scrutiny only had only 33 regions, it may be speculated that this negatively 
affected the performance of that particular plug-in estimator. The behaviour of the remaining 
QR plug-in estimators appeared to be in agreement with the conclusions of our synthetic data 
simulations. In particular, the RoPQ was found to outperform all of its counterparts. 

Overall, this re-analysis of Kirkbride et al.'s (2007) schizophrenia prevalence data set has 
helped to characterise the heterogeneity of the empirical distribution of the RRs for schizophrenia 
in the region of interest. Optimal estimation under the QR-SEL function has yielded a QR of 
approximately 1.7, which implies that the areas in the upper quartile of risk exhibit 70% higher 
risk for schizophrenia than the areas in the lower quartile of risk. This high level of heterogeneity 
in the empirical distribution of the RRs in this region suggests the presence of unmeasured risk 
factors which may account for the between-area variability. 

3.6 Conclusion 

One of the consistent findings of this chapter has been the good performance of the empirical 
quantiles and empirical QR based on the ensemble of triple-goal point estimates. The GR plug- 
in estimators behaved well throughout the set of simulations imder both spatial and non-spatial 
scenarios. This can be explained in terms of the optimality criteria satisfied by the triple-goal 
point estimates described in section 2.4.2. The first goal that this ensemble of point estimates is 
optimising is the estimation of the EDF of the parameters of interest. Since both the empirical 
quantiles and the empirical QR are properties of the EDF, it follows that the ensembles of point 
estimates optimising the estimation of the EDF would also be expected to do well when estimating 
these empirical properties. These particular findings therefore highlight another potential benefit 
of choosing to report the ensemble of triple-goal point estimates when conducting research in 
epidemiology and spatial epidemiology. In addition, we have also noted the good performance 
of the RoPQ and DoPQ under the QR-SEL and IQR-SEL functions, respectively. These plug-in 
estimators were found to consistently outperform all other plug-in estimators in our simulation 
studies. It can therefore be concluded that when the posterior quartiles of a parameter ensemble 
have been collected, one can readily obtain a robust estimation of the dispersion of that ensemble 
by using either the RoPQ or the DoPQ, depending on the nature of the models investigated. 

This chapter has also highlighted several limitations associated with the use of the WRSEL 
function for the estimation of parameter ensembles in both spatial and non-spatial models. 
Through the manipulation of the size of the parameter ensemble in our non-spatial simulations, 
we have first observed that the range of the WRSEL weights increased exponentially with the 
size of the parameter ensemble. Naturally, this is an aspect of the WRSEL function that would 
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need to be calibrated according to the specificities of the estimation problem under consideration. 
Moreover, we have also noted that the WRSEL plug-in estimators performed poorly under the 
compound Gamma model. This is due to the fact that we have specified a vector of symmetric 
weights on the ranks of the units in the ensemble of interest. Both of these problems can be 
addressed by choosing different values of the parameters ai and 02 controlling the range of the 
WRSEL weights, despite the fact that the true empirical distribution of the parameters of interest 
was skewed. Asymmetric choices may therefore help to deal with skewed parameter ensembles, 
whereas assigning very low values to ai and 02 would counterbalance the effect of a large n. How- 
ever, since there does not currently exist an automated way of estimating there parameters from 
the properties of the data, these issues ultimately limit the applicability of the WRSEL function 
as an automated estimation procedure, since it necessitates a substantial amount of preliminary 
tuning. 

A second important conclusion of this chapter is the apparent superiority of the CAR Normal 
over the CAR Laplace model for the estimation of both the empirical quantiles and the empirical 
QR of a parameter ensemble. Our spatial simulation study has shown that the utilisation of the 
CAR Laplace model tended to increase the posterior expected loss associated with the optimal 
estimators of the ensemble's quantiles and QR in comparison to the use of the CAR Normal model. 
The CAR Laplace also yielded larger posterior regrets when using plug-in estimators. Finally, our 
analysis of the schizophrenia prevalence data set showed that the posterior distributions of the 
empirical quantiles and empirical QR were less concentrated around their means when using the 
CAR Laplace model. In practice, we therefore recommend the use of the CAR Normal model 
when the dispersion of the RRs is of special interest. However, one should note that we used 
identical specifications for the variance hyperparameters for both the Normal and Laplace CAR 
priors, which may have contributed to the comparatively poor performance of the Laplace prior 
model. In particular, note that the variance parameter, r^, for both the CAR Normal and CAR 
Laplace priors was given a Gamma prior, such that 

~ Gam(0.5, 0.0005). (3.45) 

However, identical choices of variance parameters in a Laplace and a Normal distributions do not 
result in identical amount of variability. While X ~ N{0, 1) has a variance of 1.0, a Laplace 
random variate with distribution L(0, 1) has a variance of 2.0. Thus, a fair comparison of the 
CAR Normal and CAR Laplace performances would require the use of comparative hyperprior 
distributions on the variance parameter of the ensemble priors, r^. 

In both the spatially-structured simulations and in the analysis of the schizophrenia prevalence 
data set, we have computed the Q-SEL on the scale of the RRs. That is, both the Q-SEL and 
the QR-SEL were computed using 6 as the parameter of interest. However, one should note that, 
since the true empirical distribution of the RRs tends to be skewed, the estimation of low and high 
quantiles will not be penalised symmetrically. A possible solution to this problem is to compute 
the Q-SEL posterior regret on the logscale. In our notation, this may be formulated with respect 
to the following ensemble of plug-in estimators, 

I'S^'' := jlST^f , . . . , iST^ I • (3.46) 

That is, for some loss function L' , each point estimate could be computed on the basis of the 
joint posterior distribution of the RRs on the logscale, i.e. p{\ogO\y). It then follows that the 
corresponding optimal estimator under the Q-SEL function is the set of posterior empirical p- 
quantiles of the ensemble of the log6'i's. This gives the following posterior regret, 

regret (Q-SELp , Q^^,, (p)) , (3.47) 

which is defined with respect to Q-SELp(log0, 5) with p := {.25, .75}. Note that the optimal 
estimator, 5 = lE[(5iog0(p)|y] will here differ from the optimal estimator of the Q-SEL on the RR 
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scale, which we considered in the rest of this chapter. Thus, the estimation of the Q-SEL function 
is not invariant under reparametrisation and such a change of scale produces two different decision- 
theoretic problems, which cannot be directly compared. In practice, preference for one approach 
over another will be determined by the spc;cific; needs of the modeller. 

The good performance of the triple-goal plug-in estimators under both the Q-SEL and QR- 
SEL functions indicate that this ensemble of point estimates could be useful in epidemiological 
practice. As highlighted in chapter 1, the choice of specific point estimates in epidemiology and 
most particularly in spatial epidemiology is made arduous by the wide set of desiderata that such 
an ensemble of point estimates should fulfil. Here, we have seen that, in addition to providing 
good estimation of the EDF, the ranks and the set of levels of risk; the GR point estimates also 
provide a good approximation of the amount of heterogeneity in the ensemble. Therefore, the 
triple-goal point estimates constitute a good candidate for both the mapping of area-specific levels 
of risks and the reporting of summary statistics about the overall risk heterogeneity in the region 
of interest. There are other inferential objectives, however, for which the GR point estimates 
are far from optimal. These limitations will be highlighted in the next chapter where we study 
different classification loss functions. 
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Chapter 4 

Threshold and Rank Classification 
Losses 

Summary 

In this chapter, we study the problem of the classification of a subset of parame- 
ters in an ensemble given a particular threshold. This problem can be envisaged 
from two different perspectives as either (i) a rank-based classification problem or 
(ii) a threshold-based classification problem. In the first case, the total amount of 
data points that one wishes to classify is known. We are answering the question: 
Which arc the parameters situated in the top 10% of the ensemble? By contrast, 
in the threshold-based classification, we are only given a particular threshold of 
risk, and therefore need to determine both the total number of elements above 
the pre-specified threshold, and the identity of these elements. We review some 
previous research by Lin et al. (2006), who have investigated this problem from 
the ranking perspective. We adopt a similar approach when one is concerned with 
threshold-based classification and derive the minimiser in that case. We then eval- 
uate the corresponding optimal estimators of these two families of loss functions 
using spatial and non-spatial synthetic data for both weighted and unweighted 
classification losses. Of special interest, we find that a decision rule originally 
proposed by Richardson et al. (2004), in the context of spatial epidemiology, can 
be shown to be equivalent to the specification of a weighted threshold-based clas- 
sification loss. Overall, our experimental simulations support the use of the set 
of posterior means as a plug-in estimator under balanced, but not necessarily un- 
der weighted, threshold-based classification. Finally, wc illustrate the use of our 
decision-theoretic frameworks for a surveillance data set describing prevalence of 
MRSA in UK hospitals. The good performance of the SSEL point estimates on 
both types of classification paradigms indicate that this may be a good choice of 
classifiers in practice when one wishes to deprecate the occurrence of false alarms. 

4.1 Introduction 

The problem of the optimal classification of a set of data points into several clusters has occupied 
statisticians and applied mathematicians for several decades (see Gordon, 1999, for a review). As 
is true for all statistical methods, a classification is, above all, a summary of the data at hand. 
When clustering, the statistician is searching for an optimal partition of the parameter space 
into a -generally, known or pre-specified- number of classes. The essential ingredient underlying 
all classifications is the minimisation of some distance function, which generally takes the form 
of a similarity or dissimilarity metric (Gordon, 1999). Optimal classification will then result 
from a trade-off between the level of similarity of the within-cluster elements and the level of 
dissimilarity of the bctwccn-clustcr elements. In a decision-theoretic framework, such distance 
functions naturally arise through the specification of a loss function for the problem at hand. 
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Figure 4.1. Posterior probability that parameter 9 is larger than a given threshold Ca, which is here 
¥[9 > Ca|y]. In the framework proposed by Richardson et al. (2004), the element in the ensemble 
corresponding to parameter 9 is considered to be greater than Ca if P[6 > Ca|y] > a, for some a G [0, 1]. 

The task of computing the optimal partition of the parameter space then becomes a matter of 
minimising the chosen loss function. 

In spatial epidemiology, the issue of classifying areas according to their levels of risk has been 
previously investigated by Richardson et al. (2004). These authors have shown that areas can 
be classified according to the joint posterior distribution of the parameter ensemble of interest. 
In particular, a taxonomy can be created by selecting a decision rule D{a,Ca) for that purpose, 
where Ca is a particular threshold, above and below which we wish to classify the areas in the 
region of interest. The parameter a, in this decision rule, is the cut-off point associated with 
Ca, which determines the amount of probability mass necessary for an area to be allocated to 
the above-threshold category. Thus, an area i with level of risk denoted by 9i will be assigned 
above the threshold Ca if P[^i > Ca\y] > a. Richardson et al. (2004) have therefore provided a 
general framework for the classification of areas, according to their levels of risk. This research 
has attracted a lot of interest in the spatial epidemiological literature, and has been applied to 
a wide range of epidemiological settings, including healthcare use in France (Chaix et al., 2005), 
disparities in the prevalence of psychotic disorders in London (Kirkbride et al., 2007), as well as 
the study of the prevalence of bladder cancer (Lopez- Abente et al., 2006) and breast cancer (PoUan 
et al., 2007), in Spain. 

Central to the decision framework proposed by Richardson et al. (2004) is the choice of Ca and 
a. We have illustrated this general classification problem in figure 4.1 on page 69. From this figure, 
it is clear that the choice of the risk threshold Ca, apart from its dependence on a, is otherwise 
arbitrary. Therefore, Ca may be chosen on the basis of previous research, and will ultimately 
depend on the subjective evaluation of the spatial epidemiologists or applied statisticians working 
on that particular classification problem. This decision will generally be informed by previous 
epidemiological studies of identical or comparable risk surfaces. The choice of a, however, is 
statistically problematic in the sense that its value will directly determine the sensitivity and the 
specificity of the classification procedure. In selecting a value for a, one is faced with issues which 
are reminiscent of the ones encountered by a statistician of a frequentist persuasion when deciding 
which level of significance to adopt. 

Moreover, it should be noted that, in order to obtain good discrimination using their classifi- 
cation procedure, Richardson et al. (2004) made use of a value of Ca substantially different from 
the desired threshold. (That is, different statistical models were given different decision rules with 
different associated Ca-) This is another undesirable consequence of the use of a particular cut-off 
point a for optimal classification: it requires the choice of a threshold, which usually differs from 
the true threshold of interest. We may, for instance, have a target threshold, which we will denote 
by C and a decision rule D(Ca,o:) with Ca 7^ C. This particular dependence on a justifies our 
choice of notation, as we have indexed the threshold Ca in order to emphasise that its value may 
vary for different choices of a. Richardson et al. (2004) therefore adopted different decision rules 
for different models, depending on the spread of the ensemble distribution estimated by these 
different models. They studied the BYM model and the spatial mixture (MIX) model, introduced 
by Green and Richardson (2002), and chose DbymCI-O, 0.8) and -Dmix(1-5, 0.05), respectively, as 
decision rules for these two models, in order to produce a classification of areas into two clusters: 




69 



non-elevated and elevated risk areas. In this chapter, we will address the two main quandaries 
associated with the classification method proposed by Richardson et al. (2004), which are (i) the 
dependence of in D{-, •) on the choice of a, and (ii) the dependence of the entire decision rule 
on the choice of model. It should be noted that these limitations wore discussed by Richardson 
et al. (2004), who emphasised the difficulties associated with the ''calibration^^ of these decision 
rules, when considering different models, and in particular with the use of the MIX model. 

Our approach to this classification question in spatial epidemiology will draw extensively from 
the work of Lin et al. (2006), who investigated rank-based classification problems in BHMs. Al- 
though this perspective on the classification of a subset of elements in an ensemble substantially 
differs from the problem at hand, we will see that one can easily introduce similar loss families 
in the context of threshold-based classification -that is, classification based on a particular cut- 
off point expressed on the scale of the parameters of interest, as opposed to Lin et al.'s (2006) 
rank-based classification problems, where we arc given a proportion (e.g. top 20%) of elements 
in the ensemble, which we need to identify. Despite the inherent similarities between rank-based 
and threshold-based classifications, we will see that the optimal minimisers of these losses are 
substantially different. Moreover, experimentation with simulated data will show that different 
plug-in estimators are quasi-optimal under these two decision-theoretic paradigms. One of the 
clear advantages of our proposed classification framework is that it does not require any calibra- 
tion, as the rcsiilting estimates arc optimal for a specific choice of threshold C, and thus do not 
necessitate any subsequent tuning from the epidemiologist or applied statistician conducting the 
classification. 

The chapter is organised as follows. In section 4.2, we introduce both the threshold-based and 
rank-based classification frameworks, and describe their respective optimal estimators. In section 
4.3, we illustrate the use of these techniques with a non-spatial set of synthetic simulations. In 
section 4.4, the performance of the plug-in estimators of interest is evaluated within the context 
of a spatially structured set of data simulations. In section 4.5, these methods are illustrated with 
a real data set describing MRSA prevalence in UK hospitals. Finally, we close the chapter by 
discussing the broader implications of these results for epidemiologists and spatial epidemiologists 
under the light of Bayesian decision theory, especially with respect to the choice of a particular 
set of point estimates for a given set of data. 

4.2 Classification Losses 

In this section, we present our general decision-theoretic framework for the classification of elements 
of a parameter ensemble above or below a given threshold. Wc also introduce the rank-based 
classification framework introduced by Lin et al. (2006). For both types of classification schemes, 
we will consider possible plug-in estimators that can be used in the place of the optimal estimators 
for these particular loss functions. Some links with the concepts of posterior sensitivity and 
specificity will also be described for both families of loss functions. 

4.2.1 Threshold Classification Loss 

We first describe our proposed family of loss functions, following the same structure advanced 
by Lin et al. (2006). Here, wc are given a particular cut-off point C. The loss associated with 
misclassification either above or below the threshold of interest will be formulated as follows. 
Following standard statistical terminology, we will express such misclassifications in terms of false 
positives (FPs) and false negatives (FNs). These concepts are formally described as 



FP{c,e,s) :=i{e<c,s>c}, 



(4.1) 



and 



FN(c, e, 6) — lie >C,d<C}, 



(4.2) 
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which correspond to the occurrence of a false positive misclassification (type I error) and a false 
negative misclassification (type II error), respectively. In equation (4.1), our decision -denoted S- 
is above threshold, whereas the true value of the parameter -denoted 6- is below threshold, and 
the reverse situation can be observed in equation (4.2). 

For the decision problem to be fully specified, we need to choose a loss function based on the 
sets of unit-specific FPs and FNs. We here assume that C e R if the parameters of interest are real 
numbers, or C € 1R+ if the parameters of interest are strictly positive such as when considering 
RRs in the context of spatial epidemiology. Following the decision framework introduced by Lin 
et al. (2006), we therefore formalise this problem using the threshold classification loss (TCL), 
defined as follows. We first introduce the weighted version of the TCL function, and will then 
specialise our definition to the case of unweighted threshold classification. Let < p < 1. The 
p-weiglited threshold classification loss (TCLj,) function is then defined as 

1 " 

TCLp(C,6>,5) := - ^pFP{C,ei,5i) + {1 - p)F-N{C,ei,5i). (4.3) 

One of the advantages of the choice of TCLp for quantifying the misclassifications of the elements 
of a parameter ensemble is that it is normalised, in the sense that TCLp(C, 6,6) e [0,1] for any 
choice of C and p. Moreover, TCLp attains zero if the classification of each element is correct and 
1.0 if none of the elements are correctly classified. As was previously done for the loss families 
described in the preceding chapters, we will drop references to the arguments (C, 6, 6) controlling 
TCLp{C,d,S), when their specification is obvious from the context. Our main result, in this 
section, is the following minimisation. 

Proposition 1. For some parameter ensemble 0, and given a real-valued threshold C € E, and 

p S [0, 1], we have the following optimal estimator under weighted TCL, 

ef^CL ) = argmin E [TCL^ (C, d, 5)\y], (4.4) 

5 

where ^f-^p) the vector of posterior (1 — p)-quantiles defined as 

:= {g,,|y(l - P), . . . , Q8„\y{l - P)} , (4.5) 

where Qg.^y{l — p) denotes the posterior (1 — p)-quantile of the i*^ element, 9i, in the parameter 
ensemble. Moreover, ^^^p) is not unique. That is, there exists more than one minimiser in 
equation (4.4)- 

In our notation, we have emphasised the distinction between the posterior empirical quantile 
of the ensemble and the individual posterior quantiles, as follows. In chapter 3, the posterior 
quantile of the empirical distribution of the 9i^s was denoted E[(5e(p)|y]. This quantity minimises 
the posterior expected Q-SEL function for some given p. By contrast, in this chapter, we are 
interested in the posterior quantile of a single 9i, which we have denoted by Q0^\y{p) in proposition 
1, and which is formally defined as 

Qe,\yip) := inf {x e Oi : Fe^^yix) > p} , (4.6) 

where 0j is the domain of 9i. The proof of proposition 1 makes use of a strategy similar to the 
one reported by Bergcr (1980), who showed that the posterior median is the optimal estimator 
under AVL, as reported in section 2.1.3. We prove the result by exhaustion in three cases. The 
full proof is reported in section 4.7, at the end of this chapter. Note that the fact that TCLp is 
minimised by ^Ji^p) and not Oj^^ is solely a consequence of our choice of definition for the TCL 
function. If the weighting of the FPs and FNs had been (1 — p) and p, respectively, then the 
optimal minimiser of that function would indeed be a vector of posterior p-quantiles. 

We now specialise this result to the unweighted TCL family, which is defined analogously to 
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equation (4.3), as follows, 



1 

TCL(C, e, 5) := - V FP(C;, Oi, 5i) + FN(C, e^di). (4.7) 
n ^r-i 

The minimiser of this loss function can be shown to be trivially equivalent to the minimiser of 
TCL0.5. The following corollary of proposition 1 formalises this relationship between the weighted 
and unweighted TCL. 

Corollary 1. For some parameter ensemble 6 and C e R, we have 

0MED ^ argminE [TCL(C, 6, 5)\y] , (4.8) 
s 

where 

^MED gTCL ^ {Q,^|^(0.5), . . . , g,„|y(0.5)} , (4.9) 
is the vector of posterior medians, and this optimal estimator is not unique. 

As noted earlier, this corollary immediately follows from proposition 1 by noting that 

argminE[TCL(C,6',5)|y] = argminE[TCLo.5(C, £>, 5)|y], (4.10) 
s 5 

for every C. Note that although both the classical AVL loss described in section 2.1.3 and the 
unweighted TCL presented here have identical minimisers, the relationship between these two 
estimation schemes is not trivial. Indeed, the former is a problem of estimation, whereas the 
lattca- is a problem of classification. However, it can be noted that for both the AVL and the TCL 
functions, these estimators are not unique. 

Our main focus in this chapter will be on the unweighted TCL, except in section 4.4.3, where we 
consider the relationship between the decision rule originally proposed by Richardson et al. (2004) 
and the weighted TCL. We also supplement this discussion with a set of simulations evaluating 
the performance of plug-in estimators under weighted TCL. The rest of this chapter, however, 
will focus on the unweighted TCL, and except otherwise specified, the TCL function will therefore 
refer to the unweighted version presented in equation (4.7). 

A graphical interpretation of the posterior TCL is illustrated in figure 4.2 on page 73. The 
posterior expected loss under this function takes the following form, 

„ C +00 

E[TCL(C,0,5)|y] = / dF[ei\y]I{5i > C} + j dF[6i\y]I{6i < C} , (4.11) 

»=l-cx> C 

whose formulae is derived using X{6 < C,S > C} =X{0 < C}I{5 > C}. It is of special impor- 
tance to note that when using the posterior TCL, any classification -correct or incorrect- will incur 
a penalty. The size of that penalty, however, varies substantially depending on whether or not 
the classification is correct. A true positive, as in diagram (a) in figure 4.2, can be distinguished 
from a false positive, as in diagram (c), by the fact that the former will only incur a small penalty 
proportional to the posterior probability of the parameter to be below the chosen cut-off point C. 
By contrast, a false positive, as in panel (c), will necessarily incur a larger penalty, because more 
mass is located below the threshold than above the threshold. We can make similar observations 
when comparing diagrams (b) and (d). In addition, note that although the TCL attains zero when 
the classification of each element in the ensemble is correct, this it not the case for the posterior 
expected TCL, which is necessarily greater than zero. Moreover, this representation also clarifies 
the arguments used in the proof of proposition 1, and provides an intuitive justification for the 
use of the median as an optimal estimator under unweighted TCL. 
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Figure 4.2. Illustration of the components of the posterior expected TCL based on the posterior distri- 
bution of 6i. The choice of a point estimate 5i for the quantity 6i results in either a correct classification 
(a-b) or a misclassification (c-d), with (a) and (c) representing E[FP(C, 6'i, (5i)|y] and (b) and (d) denoting 

E[FN(c,e„50ly]. 
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4.2.2 Rank Clcissification Loss 



We here briefly present a different family of classifieation loss functions based on ranks, wliic;h was 
originally introduced by Lin et al. (2006). In this chapter, we will be interested in evaluating the 
performance of various plug-in estimators for rank-based classification. 

Lin ot al. (2006) were concerned with the identification of the elements of a parameter ensemble 
that represent the proportion of elements with the highest level of risk. In an epidemiological 
setting, one may, for instance, wish to estimate the ten percent of hospitals that have the highest 
RRs for a particular condition. Such a classification is therefore based on a particular rank 
percentile cut-off denoted 7 e [0, 1] , which determines a group of areas of high-risk. That is, we 
wish to identify the areas whose percentile rank is above the cut-off point 7. Specific false positive 
and false negative functions dependent on the percentile ranks of the parameter ensemble can be 
defined following the convention introduced in equations (4.1) and (4.2). This gives 

FP(7,P,5):=I{P<7,<5>7}, (4-12) 

and 

FN(7,P,,5) :=I{P>7,<5<7}, (4.13) 

where the percentile rank, P, is a fimction of the ranks of 0, as formally defined in equation 
2.41. In this chapter, the word percentile will refer to rank percentile, except specified otherwise. 
Note that we have not made an explicit notational distinction between threshold-based FP and 
FN functions and their equivalent in a percentile rank-bascid setting. However, which one we are 
referring to should be obvious from the context. Of particular interest to our development and 
related to these families of loss functions is the following unweighted rank classification loss (RCL) 
function, 

1 " 

RCL(7,0,5) := - VFP(7,P,(0),^i)+FN(7,Pi(0),5O, (4-14) 

n 

which is the rank-based analog of the unweighted TCL described in equation (4.7). Lin et al. 
(2006) showed that the RCL function is minimised by the following estimator. We report this 
result here and refer the reader to the original paper for a proof. 

Proposition 2 (Lin et al., 2006). For any < 7 < 1, the set of percentile ranks P = 
{Pi, . . . , P„}, whose elements are defined as 

satisfies 

P = argminE [RCL(7, 0, 5)|y] . (4.16) 

s 

Moreover, the minimiser P is not unique. 

Associated with the optimal classifier P, one can also derive the optimal rank R. In order 
to gain a greater understanding of the mechanisms underlying the computation of this optimal 
minimiser, one can expand the formulae in equation (4.15), using the definition of rank in equation 
(2.40) on page 27, in order to obtain the following, 

= [^^w ^ ^ p ^p^^^) ^ }' (4.17) 

for every i = 1, . . . ,n. It can be seen from this expansion that the percentiles, which will optimally 

classify the areas above a certain threshold, are the percentiles of the posterior probabilities of the 
true percentiles being above the 7 cut-off. We have already encountered this method of double 
ranking when introducing the triple-goal estimator of Shen and Louis (1998). Proposition 2 also 
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states that the minimiser Pi is not unique. This follows from the fact that the percentile ranks 
on each side of 7 form two subsets and the elements in these subsets can be permuted while 
maintaining the optimality of the resulting percentile ranks. This non-uniqueness is especially 
illustrated by the asymptotic equivalence between the minimiser P(7|y) and another popular 
classifier in the literature: the Normand's estimator (see Normand et al., 1997). 

Albeit the TCL and RCL functions have a similar structure, they nonetheless differ in two 
important ways. Firstly, the RCL can be simplified due to the symmetry between the FPs and 
FNs, while no such symmetry can be employed to simplify the formulation of the TCL function. 
Prom equation (4.14), the RCL can be simplified as follows, 

1 " 2 " 

- J2 FP(7, Pi, Si) + FN(7, Pi, Si) = -Y^ FP(7, Pi, Si), (4.18) 

i=l i=l 

where we have used the relation, 

n n 

FP(7, P^, S^) = ™(7, P^, S^), (4.19) 

i=l i=l 

which follows from the fact that any number of FPs in the context of rank classification is neces- 
sarily compensated by an identical number of FNs. By contrast, the equivalent relationship does 
not hold for TCL. That is, the quantities 

n n 

Y,FP{C,ei,Si) and 5^FN(C,^i,(5i), (4.20) 

i=l i=l 

need not be equal. This follows from the fact that the total number of data points that should be 
classified above a particular threshold C is unknown when evaluating the TCL function. However, 
this is not the case for the RCL function, where we know a priori the total number of percentiles, 
which should be classified above the target percentile of interest, 7. That is, the RCL function is 
optimised with respect to a vector S, whose 'size' is a known quantity -where, by size, we mean 
the total number of iJ^'s above 7. By contrast, for TCL, one cannot derive the number of elements 
in the latent categories. Thus, both the allocation vector S, and the total number of elements 
above C are unknown. In this sense, the minimisation procedure for the TCL function is therefore 
more arduous than the optimisation of the RCL function. 



4.2.3 Posterior Sensitivity and Specificity 

Our chosen decision-theoretic framework for classification has the added benefit of being readily 
comparable to conventional measures of classification errors widely used in the context of test 
theory. For our purpose, we will define the Baycsian sensitivity of a classification estimator S, also 
referred to as the posterior true positive rate (TPR), as follows 

TPR (COS)- ^tinmCA,S^)\y] 

where the expectations are taken with respect to the joint posterior distribution of 6. Similarly, 
the Bayesian specificity, or posterior true negative rate (TNR), will be defined as 

TNR (r9S) - Er^iE[TN(C,0„^,)|y] 

TNR(C, e, S) .- ^;^^p[,^<c^|y] ' (4-22) 

where in both definitions, we have used TP{C,9i,Si) := I {Oi >C,Si> C} and TN(C, 6*^, (5^) := 
I{6i <C,di< C}. It then follows that we can formulate the relationship between the posterior 
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expected TCL and the Bayesian sensitivity and specificity as 



n ^ n 

E[TCL(C, e, 6)\y] = - FPR(C, 6, 5) ^Pi^i < C\y] + - FNR(C, 6, 5) ^Pl^i > C\y]. 
" i=i " i=i 

where FPR(C, 6, 5) := 1 - TNR(C, 9, 6) and FNR(C, 6, S) := 1 - TPR(C, 6», 5). 

In the RCL framework, the definitions of the TPR and TNR in equations (4.21) and (4.22) 
can be adapted to the context of percentile rank-based classification in the following manner: 

TPR (-yOS)- ELiE[TP(7,P.(0),^O|y] 

and analogously for TNR(7, 0, 5). One can easily derive the relationship between the posterior 
RCL and the percentile rank-based definitions of Bayesian sensitivity and specificity. These are 

2 " 

E[RCL(7,0,5)|y] = - FNR(7, 0, 5) VP[P,(0) > 7|y], (4.24) 
n 

1=1 

where FNR(7, 6, S) := 1 - TPR(7, 6, S), and equivalently, 

2 " 

E[RCL(7,^,5)|y] = -FPR{-f, 6,5) J^mW < llvh (4-25) 
with FPR(7, e, S) :=1- TNR(7, 6, S), which follows from equation (4.18) on page 75. 
4.2.4 Performance Evaluation 

The posterior regret will be used to compare the optimal classifiers under the TCL and RCL 
functions with different plug-in classifiers, denoted h{9^ ), for some loss function of interest L' . 
The posterior regret was introduced in section 2.5. For the unweighted TCL, this quantity takes 
the following form, 

regret(TCL, h{e^')) = E[TCL(C, 6, h{e^'))\y] - minE[TCL(C, 6, S)\y], (4.26) 

s 

where here h{-) is simply the identity function returning the original vector of point estimates, and 
the optimal estimator under TCL is the vector of posterior medians of the parameter ensemble of 
interest. Similarly, the posterior regret under RCL takes the following form, 

regret(RCL,/i(0^')) =E[RCL(7,0,P(0^'))|y] -minE[RCL(7,0,(5)|y], (4.27) 

5 

where P(-) is a multivariate percentile rank function returning the vector of percentile rank cor- 
responding to the ensemble of point estimates 6^ , and the optimal estimator under the RCL 
is the minimiser reported in proposition 2. In this chapter, we will be interested in comparing 
the optimal estimators under the TCL and RCL functions with different plug-in estimators based 
on various ensembles of point estimates including the set of MLEs, the posterior means and the 
WRSEL, CB and GR ensembles, as described in chapter 2. 

Computationally, the posterior regrets of the TCL and RCL functions for several plug-in esti- 
mators can be calculated at a lesser cost by noting that both functions simply require the compu- 
tation of the vector of penalties for misclassification once. Consider the posterior TCL first. Let 
us denote the allocation vector for some classification estimator 6^ above and below the chosen 
threshold C by 

z>(0^') := {X{e^' > C}, . . . ,1{9^' > C}} , (4.28) 
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and z^, respectively. In addition, we define the following vector of posterior probabilities, 

pU0) ■■= > C|y], . . . , P[^„ > C|y]} , (4.29) 

with being defined similarly for the n-dimensional vector of posterior probabilities of each 9i 
being below C. Then, the posterior TCL used as a criterion for evaluation in this section can 
simply be computed as 

E[TCL(C, e, S)\y] = - (z^, p>) + - (z>, p§) , (4.30) 

where (•, •) stands for the dot product. It is therefore sufficient to compute the vectors of penalties 
-p^ and p^" once and then apply it to any vector of plug-in classifiers, . This formulae has also 
the added advantage of providing some insights into the generating process of the threshold-based 
loss quantities. Equation (4.30) shows that the posterior TCL is the sum of two dot products, 
where the allocations of the elements and posterior probabilities of these elements being above 
or below a threshold have been inverted. An equivalent computational formulae can be derived 
for the posterior expected RCL. For the percentile rank-based performance, we evaluated the 
percentile-based classifiers, denoted P(0^ ) := {Px(Q^ ),■•■, Pn{^^ )}> using the following simpli- 
fied Bayesian expected loss, 

E[RCL(7,0,P(r))|y] = ^(z|,P>) = ^(<,p|)> (4-31) 

which can be derived from an application of equation (4.18), and where we have defined Zy, z^, 
and p^, following the convention introduced in equations (4.28) and (4.29), respectively. 
For all simulated scenarios and replicate data sets in the sequel, when considering the perfor- 
mance of plug-in estimators under TCL, we fixed the threshold C to the following value, 

C--=~S+[^Y.^{Oi-^"j ■ (4.32) 

with := 9i being the true ensemble mean, and d denotes the true ensemble distribution. 

That is, we evaluated the ability of the target plug-in estimators to determine which elements in 
the parameter ensemble should be classified as farther than one standard deviation from the 
true empirical mean. For the RCL, we fixed 7 = .80 for all scenarios and replicates, which 
corresponded to the identification of the top 20% of ensemble's elements taking the largest values. 
These conventions for the TCL and RCL functions were applied to both the non-spatial and spatial 
simulation experiments. 



4.3 Non-spatial Data Simulations 

The full description of the synthetic data sets used in this simulation experiment have been fully 
described in chapter 3. We are here concerned with two BHMs: (i) the compound Gaussian and 
(ii) compound Gamma models. In the next section, we briefly describe which parameters will be 
the object of the classification scheme in each model family and recall the two main factors, which 
were manipulated in these simulations. 

4.3.1 Parameter Ensembles and Experimental Factors 

Our choice of simulation models and the specific values given to the hyperparamcters parallel the 
first part of the simulation design conducted by Shen and Louis (1998). As in chapter 3, the same 
models were used to generate and to fit the data. The parameter ensembles, denoted 9, in both 
the compound Gaussian and compound Gamma models were classified according to the threshold 
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Table 4.1. Posterior regrets based on TCL(C, 0,^), with C — ¥,[0] + sd[9], for five plug-in estimators 
for the conjugate Gaussian model in equation (2.20) and the conjugate Gamma model in equation (2.21), 
and for 3 different levels of RLS, and 3 different values of n, averaged over 100 replicate data sets. The 
posterior expected loss of the optimal estimator is given in the first column. Entries are scaled by a factor 
of 10^. The posterior regrets expressed as percentage of the posterior loss under the optimal estimator is 
indicated in parentheses. 



Scenarios Posterior regrets'^ 







TCL 


MLE 


SSEL 


WRSEL 


CB 




GR 




RLS = 1 






















N-N, n = 


100 


133.6 


49.9 


(37) 


0.0(0) 


16.6 (12) 


13.3 


(10) 


13.3 


(10) 


N-N, n = 


200 


133.1 


49.1 


(37) 


0.0(0) 


69.7 (52) 


13.2 


(10) 


13.8 


(10) 


N-N, n = 


1000 


133.2 


48.8 


(37) 


0.0(0) 


167.3(126) 


13.9 


(10) 


14.1 


(11) 


G-IG, n = 


= 100 


109.7 


74.0 


(67) 


3.5(3) 


35.9 (33) 


37.3 


(34) 


31.2 


(28) 


G-IG, n = 


= 200 


102.4 


72.1 


(70) 


2.9(3) 


100.8 (98) 


33.7 


(33) 


32.1 


(31) 


G-IG, n = 


= 1000 


99.1 


72.4 


(73) 


2.6(3) 


268.0(270) 


29.4 


(30) 


33.3 


(34) 


RLS = 20 






















N-N, n = 


100 


127.9 


61.3 


(48) 


0.0(0) 


15.0 (12) 


15.2 


(12) 


17.2 


(13) 


N-N, n = 


200 


131.4 


59.5 


(45) 


0.0(0) 


63.4 (48) 


14.3 


(11) 


18.8 


(14) 


N-N, n = 


1000 


128.5 


59.5 


(46) 


0.0(0) 


173.0(135) 


14.1 


(11) 


18.7 


(15) 


G-IG, n = 


= 100 


102.9 


68.2 


(66) 


2.3(2) 


28.6 (28) 


30.1 


(29) 


30.1 


(29) 


G-IG, n = 


= 200 


100.4 


67.7 


(67) 


2.5(3) 


94.6 (94) 


28.1 


(28) 


31.5 


(31) 


G-IG, n = 


= 1000 


93.9 


69.0 


(73) 


2.2(2) 


276.2(294) 


25.0 


(27) 


31.8 


(34) 


RLS = 100 




















N-N, n = 


100 


122.8 


70.5 


(57) 


0.0(0) 


12.6 (10) 


14.9 


(12) 


23.4 


(19) 


N-N, n = 


200 


123.3 


68.4 


(56) 


0.0(0) 


53.6 (43) 


15.2 


(12) 


23.3 


(19) 


N-N, n = 


1000 


123.4 


70.6 


(57) 


0.0(0) 


174.1(141) 


14.6 


(12) 


24.3 


(20) 


G-IG, n = 


= 100 


99.2 


62.8 


(63) 


2.5(3) 


24.1 (24) 


25.4 


(26) 


29.9 


(30) 


G-IG, n = 


= 200 


93.5 


64.4 


(69) 


1.9(2) 


78.8 (84) 


23.5 


(25) 


30.9 


(33) 


G-IG, n = 


= 1000 


93.2 


63.3 


(68) 


2.0(2) 


314.9(338) 


21.7 


(23) 


32.1 


(34) 



Entries for the posterior regrets have been truncated to the closest first digit after the decimal point, 
and entries for the percentage regrets have been truncated to the closest integer. For some entries, 
percentage regrets are smaller than 1 percentage point. 



described in equation (4.32). This choice of classification threshold resulted in approximately 16% 

and 10% of the ensemble distribution being above C for the Normal-Normal and Gamma-Inverse 
Gamma models, respectively. This choice of C therefore produced a proportion of above-threshold 
ensemble elements of the same order of magnitude as the one used for the evaluation of the posterior 
expected RCL, which was 7 = .80. As in chapter 3, we manipulated the ratio of the largest to the 
smallest (RLS) sampling variances in both models. In addition, we also considered the effect of an 
increase of the size of the parameter ensemble on the posterior expected TCL and RCL functions. 
The plug-in estimators of interest were the classifiers constructed on the basis of the ensemble of 
MLEs, posterior means, as well as the WRSEL, CB and GR plug-in classifiers. 

4.3.2 Plug-in Estimators under TCL 

The results of these simulations for the TCL function are summarised in table 4.1 on page 78. As 
in chapter 3, we have reported these results in terms of both absolute and percentage posterior 
regrets. The percentage regret expresses the posterior regret of a particular plug-in estimator as 
a proportion of the posterior expected TCL under the optimal estimator, 0^^^. 

Overall, the ensemble of posterior means exhibited the best performance over all simulation 
scenarios considered. The SSEL plug-in classifiers was found to be substantially better than all the 
other plug-in estimators in terms of posterior regrets. Moreover, since the set of optimal classifiers 
under the TCL function is the ensemble of posterior medians, the performance of the ensemble 
of posterior means was found to be no different from the one of the optimal classifiers under the 
compound Gaussian model. The performance of the SSEL plug-in estimators also approached 
optimality under the compound Gamma BHM, with its percentage posterior regret not exceeding 
3%. Of the remaining four plug-in classifiers, the CB and GR ensembles of point estimates 
performed best. The performance of these two ensembles of point estimates was approximately 
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Table 4.2. Posterior regrets based on RCL(7,0,<5), with 7 := .80, for five plug-in estimators, for the 
conjugate Gaussian model in equation (2.20) and the conjugate Gamma model in equation (2.21), and for 
3 different levels of RLS, and 3 different values of n, averaged over 100 replicate data sets. The posterior 
expected loss of the optimal estimator is given in the first column. Entries are scaled by a factor of 10^. 
The posterior regrets expressed as percentage of the posterior loss under the optimal estimator is indicated 
in parentheses. 



Scenarios Posterior regrets'^ 





RCL 


MLE 


SSEL 




WRSFJ, 

V V J- V.O J—J 


CB 




GR 




RLS = 1 




















N-N, n = 100 


173.01 


0.09 (0) 


0.03 


(0) 


0.06 (0) 


0.03 


(0) 


0.05 


(0) 


N-N, n = 200 


172.53 


0.10 (0) 


0.04 


(0) 


0.15 (0) 


0.04 


(0) 


0.04 


(0) 


N-N, n = 1000 


172.19 


0.08 (0) 


0.03 


(0) 


1.41 (1) 


0.03 


(0) 


0.04 


(0) 


G-IG, n = 100 


221.72 


0.11 (0) 


0.06 


(0) 


0.11 (0) 


0.06 


(0) 


0.04 


(0) 


G-IG, n = 200 


221.60 


0.11 (0) 


0.05 


(0) 


0.16 (0) 


0.05 


(0) 


0.04 


(0) 


G-IG, n = 1000 


221.21 


0.12 (0) 


0.06 


(0) 


0.89 (0) 


0.06 


(0) 


0.05 


(0) 


RLS = 20 




















N-N, n = 100 


169.13 


9.64 (6) 


0.91 


(1) 


0.18 (0) 


0.91 


(1) 


2.10 


(1) 


N-N, n = 200 


172.33 


9.07 (5) 


0.76 


(0) 


0.77 (0) 


0.76 


(0) 


2.05 


(1) 


N-N, n = 1000 


169.80 


8.91 (5) 


0.79 


(0) 


9.88 (6) 


0.79 


(0) 


2.04 


(1) 


G-IG, n = 100 


216.42 


2.60 (1) 


0.08 


(0) 


0.15 (0) 


0.08 


(0) 


0.20 


(0) 


G-IG, n = 200 


215.70 


2.03 (1) 


0.06 


(0) 


0.26 (0) 


0.06 


(0) 


0.22 


(0) 


G-IG, n = 1000 


215.53 


2.08 (1) 


0.05 


(0) 


2.04 (1) 


0.05 


(0) 


0.22 


(0) 


RLS = 100 




















N-N, n = 100 


169.24 


20.68(12) 


1.95 


(1) 


0.39 (0) 


1.95 


(1) 


4.24 


(3) 


N-N, n = 200 


167.74 


19.83(12) 


1.93 


(1) 


1.77 (1) 


1.93 


(1) 


3.96 


(2) 


N-N, n = 1000 


167.94 


19.53(12) 


2.20 


(1) 


16.47(10) 


2.20 


(1) 


4.76 


(3) 


G-IG, n = 100 


209.71 


4.07 (2) 


0.08 


(0) 


0.22 (0) 


0.08 


(0) 


0.58 


(0) 


G-IG, n = 200 


212.23 


3.39 (2) 


0.11 


(0) 


0.68 (0) 


0.11 


(0) 


0.71 


(0) 


G-IG, n = 1000 


210.66 


3.08 (1) 


0.11 


(0) 


6.81 (3) 


0.11 


(0) 


0.69 


(0) 



Entries for the posterior regrets have been truncated to the closest second digit after the decimal point, 
and entries for the percentage regrets have been truncated to the closest integer. For some entries, 
percentage regrets are smaller than 1 percentage point. 



equivalent over the range of scenarios studied. However, some differences are nonetheless notable. 

The CB plug-in classifiers tended to exhibit a better performance than the triple-goal classifiers 
under the compound Gaussian model. This superiority of the CB plug-in estimator over the GR 
one was accentuated by an increase in RLS. This trend was also true under the compound Gamma 
model, for which the CB plug-in classifiers modestly outperformed the triple-goal estimator. 

The WRSEL and MLE plug-in classifiers exhibited the worst performance. Although the 
WRSEL plug-in classifier tended to outperform the MLE-based one for small parameter ensembles, 
the posterior regret of the WRSEL rapidly increased with n. The poor performance of the WRSEL 
plug-in estimator can here be explained using the same arguments that we have discussed in 
section 3.3.5 on page 44. The effect of the vector of weights in the WRSEL function is indeed 
highly sensitive to the size of the parameter ensemble, as was described in equation (3.27) and can 
be observed from figure 3.1. In addition, increasing the size of the parameter ensemble resulted in 
a modest reduction of the posterior expected TCL under the optimal estimator for the compound 
Gamma model, but not for the compound Gaussian model. Among the plug-in estimators, only 
the CB classifier benefited from an increase in n, and this improvement was restricted to the 
compound Gamma model. For all other plug-in estimators, no systematic effect of the size of the 
parameter ensemble could be identified. We now turn to the RCL simulation results, which are 
characterised by a different ordering of the plug-in classifiers. 

4.3.3 Plug-in Estimators under RCL 

Table 4.2 on page 79 documents the performance of the different classifiers under posterior expected 
RCL. Overall, all estimators performed well with posterior regrets typically within a few percentage 
points of the optimal estimator. The ensemble of posterior means and the CB plug-in classifier 
performed slightly better than the other plug-in estimators considered. In fact, these two families 
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of classifiers exhibited identical performances on all the simulation scenarios. (This can be observed 
by comparing columns 2 and 8 of table 4.2 on page 79). This is due to the fact that the CB plug- 
in estimator is simply a monotonic function of the posterior means, which preserves the ranking 
of this ensemble of point estimates. That is, from equation (2.53) in chapter 2, the CB point 
estimates are defined as 

:= w^fS^L ^ (1 _ ^)^SSEL^ (4 33^ 

and therefore it follows that 

n n 
i?,(0SSEL) ^ ^ J I^SSEL > ^SSELj ^ j^CB > ^Cb| ^ JJ.(0CB)^ (4 34^ 

i=i i=i 

which accounts for the identical posterior regrets of these two plug-in estimators in table 4.2. 
The GR plug-in classifiers exhibited the third best percentage regret performance after the SSEL 
and CB estimators. This is somewhat surprising since the ranks of the GR point estimates arc 
explicitly optimised as part of the triple-goal procedure. However, this optimisation of the ranks 
is conditional upon the optimisation of the EDF of the parameter ensemble, and therefore the GR 
point estimates arc only expected to produce quasi-optimal ranks. The WRSEL plug-in classifier 
tended to outperform the MLE one, although the latter appeared to do better for RLS = 1. 

Manipulation of the RLS experimental factor resulted in a substantial increase of the posterior 
regrets for all plug-in estimators under both the compound Gaussian and compound Gamma 
models. This increase in posterior regret due to higher RLS was especially severe for the compound 
Gaussian model. Under this model, all plug-in estimators performed worse as the RLS increased. 
This may be explained in terms of the relationship between the level of the RLS factor and the 
ranking of the different elements in a parameter ensemble. That is, the larger the RLS factor, 
the greater was the influence of hierarchical shrinkage on the ordering of the ensemble's elements, 
since the ^i's with larger sampling variance tended to be more shrunk towards the prior mean. It 
was difficult to pinpoint the exact effect of changes in the size of the parameter ensemble on the 
performance of the various plug-in estimators. As was noted before, the posterior regret associated 
with the use of the WRSEL plug-in estimator substantially deteriorated as n increased (sec section 
3.3.5). However, no systematic trend related to an increase of the size of the parameter ensembles 
seemed to emerge for the other plug-in estimators. 

In contrast to threshold-based classification, most of the different plug-in estimators under the 
RCL function exhibited performance very close to the one of the optimal estimator. In particular, 
one can observe that all plug-in classifiers produced a percentage posterior regret of in at least 
one of the simulation scenarios. This starkly contrasts with the behaviour of the same ensembles 
of point estimates under the TCL function. This set of non-spatial simulations therefore high- 
lights a pressing need for carefully choosing plug-in estimators when considering threshold-based 
classification. However, such a choice appears to be somewhat less consequential when considering 
rank-based classification. 

4.4 Spatial Simulations 

We now turn to the set of spatial simulations, which have been conducted in order to evaluate 
the performance of the diff'erent plug-in classifiers under both the TCL and RCL functions for 
spatially structured parameter ensembles. In addition, we also consider the weighted TCL func- 
tion, as this can be shown to be equivalent to the decision rule proposed by Richardson et al. 
(2004). The complete description of these synthetic data sets was provided in section 3.4. In this 
section, we solely highlight the specific aspects of the simulations, which are directly relevant to 
the classification problem at hand. 
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4.4.1 Parameter Ensembles and Experimental Factors 

Wc investigated four different spatially structured scenarios: SCI, characterised by one isolated 
cluster of elevated risk; SC2, composed of a mixture of isolated areas and isolated clusters of 
elevated risk; SC3, where the spatial structure of the level of risk was generated by the Matern 
fmiction; and SC4, where the risk surface was indexed by a hidden covariate. In addition, we varied 
the heterogeneity of the risk surface in all of these scenarios by modifying the data generating 
process, in order to produce a greater dispersion of the true RRs. We recovered these simulated 
levels of risk by using two different models: BYM and the robustified Laplace version of the BYM 
model, which is denoted by LI. These two models were fully described in section 3.4. Finally, we 
also varied the level of the expected counts in order to asses the sensitivity of the performance 
of the classification estimators to modifications in the amount of information available from the 
data. All these analyses were conducted within the WinBUGS 1.4 software (Lunn et al., 2000). As 
for the non-spatial data simulations, we evaluated the different classifiers using two loss criteria: 
the posterior expected TCL and the posterior expcc;tc;d RCL functions. Moreover, we considered 
the performance of five different plug-in classifiers based on the MLE, SSEL, WRSEL, CB and 
triple-goal ensembles of point estimates. These different plug-in estimators were compared using 
the posterior regrets of the TCL and RCL functions with respect to the vector of RRs, denoted 6 
in the models of interest. We discuss the performance of the different plug- in classifiers under the 
(i) unweighted TCL, (ii) weighted TCL and (iii) RCL functions, in turn. Finally, we evaluate the 
consequences of scaling the expected counts on the performance of the various plug-in estimators 
of interest. 

All the results presented in this section are expressed as posterior expected losses, which are 
functions of the joint posterior distribution of the parameter ensemble of interest. Naturally, this is 
highly dependent on our choice of model. For completeness, we therefore also compared the optimal 
classifiers with respect to the true values of the simulated data. For the BYM model, the use of 
the optimal classifier (i.e. the vector of posterior medians) yielded 8.1% of misclassifications, on 
average over all simulation scenarios. This misclassifcation rate was slightly higher for the Laplace 
model at 8.6%. 

4.4.2 Plug-in Estimators under Unweighted TCL 

The results of the spatial simulations under the posterior expected TCL function arc reported in 
tables 4.3 on page 82. As before, the percentage regrets associated with each plug-in estimator is 
reported in parentheses. Overall, the SSEL plug-in classifiers were found to be quasi-optimal for 
all simulation scenarios. This may be explained by the fact that in both models of interest, the 
full conditional distribution of each 9i was a symmetric distribution. 

The ordering of the remaining plug-in classifiers in terms of percentage regrets varies depending 
on the type of spatial scenarios considered. Overall, the CB classifier was foimd to be the second 
best plug-in estimator after the ensemble of posterior means. This was true for all spatial scenarios, 
except under SC2, where the triple-goal estimator demonstrated better performance than the CB 
classifier, except when considering a high degree of variability in the parameter ensemble. The 
MLE-based classifiers behaved poorly throughout the entire set of simulations, and the WRSEL 
plug-in classifier was consistently outperformed by its counterparts, except for the SCI scenario 
and under a high level of variability of the ensemble distribution. 

Increasing the heterogeneity of the parameter ensemble tended to produce easier classification 
problems for the plug-in estimators. This was especially apparent when examining the posterior 
TCL under the optimal classifier, in table 4.3, where one can observe that the posterior expected 
losses diminish with increased levels of heterogeneity, in all four spatial scenarios. That is, most 
plug-in estimators appeared to benefit from an increase of the parameter ensemble's dispersion. 
In terms of relative performance as measured by percentage posterior regret, the areas in the 
SCI scenario were found to be easier to classify for all plug- in classifiers when greater levels of 
heterogeneity was considered. Under SC2 and to a much lesser extent under SC3, only the MLE 
and CB classifiers appeared to benefit from an increase in the parameter ensemble's dispersion. 
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Table 4.3. Posterior regrets based on TCL(C, 0, 5) with C := E[0] +sd[0] for five plug-in estimators and 
with the posterior expected loss of the optimal estimator in the first column. Results are presented for 3 
different levels of variability and for 4 spatial scenarios: an isolated cluster (SCI), a set of isolated clusters 
and isolated areas (SC2). highly structured spatial heterogeneity (SC3), and spatial structure generated 
by a hidden covariate (SC4). Entries were scaled by a factor of 10^, with posterior regrets expressed as 
percentage of the posterior loss under the optimal estimator indicated in parentheses. 



Scenarios Posterior regrets'^ 





TCL 


MLE 


SSEL 


WRSEL 


CB 


GR 


r.niii \//Tr'i/~ih 






















71.6 


110.3 


{1-0^) 


0.1 




117.9 


(165) 


1 2 9C1 


20 1 




49.8 


39.2 




0.1 




151.3 


(304) 


1 n 6^2^ \ 


4 n Csi 




67.9 


26.8 




0.1 




156.2 


(230) 


1 7 (iS 


2 4 (A) 


RYM-Sr!4 


51.4 


33.1 




0.1 


roi 


191.8 


(373) 


4 4 fQ") 


4 3 (R) 


Ll-SCl 


76.1 


107.4 


(141) 


0.1 


(0) 


115.7 


(152) 


14.4(19) 


23 1 r3ni 


L1-SC2 


52.0 


37.8 


(73) 


0.1 


(0) 


159.9 


(308) 


12.7(24) 


5.6(11) 


L1-SC3 


78.2 


20.7 


(26) 


0.2 


(0) 


159.0 


(203) 


3.1 (4) 


3.6 (5) 


L1-SC4 


60.4 


25.9 


(43) 


0.1 


(0) 


197.0 


(326) 


5.5 (9) 


6.5(11) 


Med. Variab. 




















RYM-SCl 


26.6 


25.2 


(95) 


0.0 


(0) 


78.6 


(296) 


1.1 (4) 


7.9(30) 


RYM-SC2 


44.1 


15.2 


(34) 


0.1 


(0) 


104.9 


(238) 


4.9(11) 


1.8 (4) 


BYM-SC3 


42.6 


10.9 


(25) 


0.1 


(0) 


110.9 


(260) 


0.8 (2) 


1.6 (4) 


BYM-SC4 


36.8 


12.7 


(35) 


0.0 


(0) 


178.2 


(484) 


1.7 (5) 


2.2 (6) 


Ll-SCl 


31.3 


23.3 


(74) 


0.0 


(0) 


83.8 


(267) 


1.4 (4) 


10.5(33) 


L1-SC2 


44.1 


13.8 


(31) 


0.1 


(0) 


113.2 


(257) 


5.3(12) 


2.0 (5) 


L1-SC3 


48.3 


7.9 


(16) 


0.1 


(0) 


119.9 


(248) 


1.6 (3) 


2.1 (4) 


L1-SC4 


40.3 


9.7 


(24) 


0.1 


(0) 


191.3 


(475) 


2.4 (6) 


2.6 (6) 


High Variab. 




















BYM-SCl 


2.9 


1.5 


(52) 


0.0 


(0) 


0.7 


(25) 


0.0 (0) 


0.2 (6) 


BYM-SC2 


25.5 


3.9 


(15) 


0.0 


(0) 


106.1 


(416) 


1.2 (5) 


3.7(15) 


BYM-SC3 


35.9 


8.6 


(24) 


0.1 


(0) 


107.8 


(300) 


0.6 (2) 


1.3 (4) 


BYM-SC4 


27.3 


6.0 


(22) 


0.0 


(0) 


98.7 


(361) 


0.9 (3) 


1.2 (5) 


Ll-SCl 


4.1 


1.4 


(34) 


0.0 


(0) 


1.6 


(38) 


0.1 (1) 


1.0(25) 


L1-SC2 


25.0 


3.7 


(15) 


0.0 


(0) 


107.7 


(430) 


1.0 (4) 


4.1(16) 


L1-SC3 


40.8 


5.1 


(12) 


0.1 


(0) 


116.7 


(286) 


0.9 (2) 


1.6 (4) 


L1-SC4 


28.7 


4.5 


(16) 


0.0 


(0) 


115.1 


(401) 


1.2 (4) 


1.6 (6) 



^ Entries for the posterior regrets have been truncated to the closest first digit after the decimal point, 
and entries for the percentage regrets have been truncated to the closest integer. For some entries, 
percentage regrets are smaller than 1 percentage point. 



For SC4, the CB, MLE and to a lesser extent the GR plug-in classifiers appeared to benefit from 
higher variability in the parameter ensemble. However, for the plug-in estimators, which failed to 
estimate the overall shape of the ensemble distribution, such as the WRSEL classifier, the number 
of areas above or below the prescribed threshold was more likely to be erroneous. Hence, we did 
not observe any systematic improvement of the performance of the WRSEL classifier when the 
parameter ensemble's dispersion increased. 

As was previously noted, the CAR Normal model was found to yield lower posterior losses than 
the CAR Laplace model across all the studied spatial scenarios. In chapter 3, we have emphasised 
that the simulation of discrete two-category distributions in the SCI and SC2 scenarios had detri- 
mental consequences on the quantile estimation procedures. We here make a similar caveat for the 
assessment of classification estimators. The fact that the simulated ensemble distributions in SCI 
and SC2 take a discrete form should be taken into account when evaluating the performance of 
the classification procedures under scrutiny in this chapter. In particular, the classifiers, which are 
dependent on the entire ensemble distribution will be more heavily penalised. This dependency 
was especially detrimental to the GR-based classification because each point estimate in this set of 
point estimates is dependent on the full ensemble distribution, as is visible from table 4.3 for SCI 
especially. However, this detrimental effect on the performance of the GR classifier was attenuated 
by an increase in the variability of the parameter ensemble. 
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Table 4.4. Posterior regrets based on TCLo,s{C,G,S) with C := 1.0 for five plug-in estimators and 
with the posterior expected loss of the optimal estimator in the first column. Results are presented for 3 
different levels of variability and for 4 spatial scenarios: an isolated cluster (SCI), a set of isolated clusters 
and isolated areas (SC2), highly structured spatial heterogeneity (SC3), and spatial structure generated 
by a hidden covariate (SC4). Entries are scaled by a factor of 10^, with posterior regrets expressed as 
percentage of the posterior loss under the optimal estimator in parentheses. 



Scenarios Posterior regrets' 





TCLq.s 


MLE 


SSEL 


WRSEL 


CB 


OR 


T .rill} 1/ n T''i fi h 


















RYM SPl 

JJ J. iVl OV_J± 


80 


92(114) 


58 


{1^) 


48 (59) 


60 




79 (98) 


RYM-sr;2 


52 


54(103) 


28 


(53) 


30 (58'> 


30 


(58) 


55C1 O^"! 


1 J X IVX kJ V_y (J 


39 


tjcfy X KJ\J 1 


20 




36 (91) 


20 




20 r5oi 


RYM-SC4 


47 


55(117) 


24 




44 (QaS 


28 


59) 


37 (79) 


Ll-SCl 


84 


OiJl XU J- J 


67 


(80) 


61 (72) 


69 


'82) 


OO I J.U-L J 


L1-SC2 


57 


46 (80) 


33 


(58) 


37 (64) 


35 


'62) 


57(100) 


L1-SC3 


48 


36 (75) 


27 


(55) 


38 (78) 


27 


(55) 


27 (.56) 


L1-SC4 


53 


44 (83) 


29 


(56) 


46 (88) 


32 


(61) 


42 (79) 


Med. Variab. 


















BYM-SCl 


68 


74(109) 


46 


(68) 


24 (35) 


47 


(69) 


68(101) 


BYM-SC2 


25 


24 (99) 


9 


(37) 


9 (38) 


10 


(40) 


38(156) 


BYM-SC3 


30 


26 (86) 


14 


(48) 


41(134) 


14 


(48) 


15 (51) 


BYM-SC4 


34 


35(103) 


17 


(49) 


48(139) 


18 


(54) 


24 (70) 


Ll-SCl 


70 


67 (96) 


49 


(70) 


26 (38) 


50 


(72) 


71(102) 


L1-SC2 


28 


19 (70) 


11 


(41) 


11 (42) 


12 


(45) 


41(148) 


L1-SC3 


37 


24 (64) 


18 


(50) 


38(103) 


18 


(50) 


21 (56) 


L1-SC4 


38 


30 (78) 


19 


(50) 


46(121) 


21 


(55) 


28 (74) 


High Variab. 


















BYM-SCl 


52 


55(106) 


30 


(58) 


37 (71) 


30 


(58) 


61(117) 


BYM-SC2 


7 


5 (66) 


1 


(18) 


6 (82) 


1 


(19) 


17(236) 


BYM-SC3 


28 


24 (85) 


15 


(51) 


45(159) 


15 


(52) 


15 (51) 


BYM-SC4 


26 


23 (90) 


12 


(46) 


56(217) 


13 


(49) 


17 (66) 


Ll-SCl 


52 


53(102) 


31 


(60) 


36 (69) 


31 


(60) 


62(119) 


L1-SC2 


9 


4 (45) 


2 


(24) 


6 (67) 


2 


(25) 


19(224) 


L1-SC3 


32 


22 (68) 


17 


(52) 


42(128) 


17 


(52) 


18 (56) 


L1-SC4 


28 


20 (71) 


13 


(46) 


51(179) 


14 


(48) 


20 (70) 



^ Entries for both the posterior regrets and percentage regrets have been truncated to the closest 
integers. 



4.4.3 Plug-in Estimators under Weighted TCL 

Here, we consider the performance of various plug-in estimators under a weighted TCL func- 
tion. This decision-theoretic framework reproduces the decision rule proposed by Richardson 
et al. (2004) for CAR models in the context of spatial epidemiology. In the notation adopted by 
Richardson et al. (2004), the i**^ area in an ensemble was classified as having an "increased risk" 
when the following condition was verified, 

P[^i > C\y] > p. (4.35) 

For the BYM and BYM-Ll models, these parameters were given the following values: p = .80, 
and C = 1.0. This particular decision rule can easily be seen to be equivalent to a weighted TCL 
based on 1 — p as introduced in equation 4.3, such that 

1 " 

TCLo.8(LO,0,5) := -Y,0.8FF{1.0,ei,Si) + 0.2FN{1.0,ei,6i), (4.36) 

i=l 

which implies that Richardson et al. (2004) selected a conservative decision rule, giving a larger 
penalty to FPs than to FNs. As a result, the number of potential false alarms is deprecated. This 

rule indeed requires that a large amount of probability mass (p = 0.80) is situated above threshold 
before declaring an area to have "increased risk" for a given medical condition of interest. Using 
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proposition 1, the posterior expectation of the loss function in equation (4.36) is minimised by 
the ensemble of posterior .20-quantiles, denoted ^7^20)" ^® have evaluated the different plug- 
in estimators of interest under this weighted classification loss and reported the results of these 
simulations for the CAR Normal and CAR Laplace models in table 4.4 on page 83. 

The ordering of the plug-in estimators in terms of percentage regrets reproduced the findings 
reported under the weighted TCL. Moreover, the performance of the plug-in classifiers was found 
to be approximately consistent across the different spatial scenarios considered. Overall, the SSEL 
estimator exhibited the best performance throughout this simulation study. Under the SCI and 
SC2 scenarios, the SSEL classifier was more or less on a par with the CB estimator, especially 
for medium to high heterogeneity. This can be observed by considering the first two rows of each 
of the three sections of table 4.4 on page 83. However, the WRSEL classifier was found to do 
marginally better than the SSEL and CB plug-in estimators under SC2 and for a low level of 
variability in the ensemble distribution. Under the SC3 and SC4 scenarios, the SSEL, CB and 
OR classifiers exhibited similar behaviour and outperformed their counterparts for all levels of 
heterogeneity. This can be observed by comparing the different columns in the third and fourth 
rows of each of the three sections of tabic 4.4. This set of simulations therefore showed that while 
the set of posterior means constitute a good plug-in estimator under the unweighted TCL, it also 
outperformed its counterparts under a weighted version of the same loss function, when a greater 
penalty is given to false positives. In addition, we also conducted some further simulations under 
TCLo.2, which gives a greater penalty to false negatives (results not shown). These simulations 
yielded a different ordering of the plug-in classifiers. In that case, the SSEL estimator were found 
to be outperformed by some of its counterparts under several spatial scenarios. These findings are 
contrasted and discussed in section 4.6. 

4.4.4 Plug-in Estimators under RCL 

The results of the spatial simulations under the posterior expected RCL are summarised in table 
4.5 on page 85. Overall, the SSEL and CB classifiers were foimd to outperform their counterparts 
under all the spatially structured scenarios studied. This corroborates our findings for the non- 
spatial simulations in section 4.3.3. As indicated in that section, the ranks obtained from these 
two classifiers are identical, because of the monotonicity of the CB transformation of the posterior 
means. The triple-goal estimator was also found to exhibit good performance, which closely 
followed the ones of the SSEL and CB plug-in classifiers. The MLE and WRSEL demonstrated the 
worse performance overall, with the use of the MLE classifier yielding lower percentage regret under 
the SC3 and SC4 scenarios. By contrast, the WRSEL plug-in estimator outperformed the MLE 
classifier on the SCI and SC2 spatial scenarios, albeit as the variability of the ensemble distribution 
increased, the MLE became better than the WRSEL estimator under SC2. This discrepancy in 
performance between the MLE and WRSEL classifiers may be explained by the extreme cases 
considered under the SCI and SC2 scenarios, for which the WRSEL plug-in estimator was generally 
found to outperform the MLE-based classifier. Indeed, as discussed in chapter 3, the discrete 
nature of the true ensemble distributions in both SCI and SC2 required a very high level of 
countershrinkage, which was better achieved by the artificial re-weighting of the WRSEL function 
than by the MLE classifiers. Note, however, that for the more standard SC3 and SC4 scenarios, 
the MLE plug-in estimator was found to provide better ranks than the WRSEL estimator. 

Increasing the amount of heterogeneity present in the ensemble distribution tended to system- 
atically diminish the posterior expected loss under the optimal estimator. As noted in section 
4.4.2 when considering the TCL function, greater variability in the ensemble distribution tended 
to attenuate the effect of hierarchical shrinkage on rank estimation for all plug-in estimators. The 
performance of the SSEL, CB and GR plug-in classifiers was approximately stable under different 
levels of heterogeneity. The MLE-based estimator, however, substantially benefited from increas- 
ing the dispersion of the parameter ensemble. For the WRSEL classifier, the eflFect of increasing 
the parameter ensemble's heterogeneity was more difficult to evaluate, and tended to vary with 
the spatial scenario considered. 

As for all other simulations, the posterior expected RCL when using the optimal estimator was 
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Table 4.5. Posterior regrets based on RCL(7, 6, 5) with 7 = .80, for five plug-in estimators and with the 
posterior expected loss of the optimal estimator in the first column. Results are presented for 3 different 
levels of variability and for 4 spatial scenarios: an isolated cluster (SCI), a set of isolated clusters and 
isolated areas (SC2), highly structured spatial heterogeneity (SC3), and spatial structure generated by 
a hidden covariate (SC4). Entries were scaled by a factor of 10^, with posterior regrets expressed as 
percentage of the posterior loss under the optimal estimator indicated in parentheses. 



Scenarios Posterior regrets'^ 
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SSEL 
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0.36 
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9.61 




0.11 
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0.07 


Cn") 


0.22 
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105.46 


16.45 




0.05 
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17.01 




0.05 
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0.36 
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204.99 


11.55 




0.35 


(n\ 
yy) 


2.97 




0.35 


col 


0.67 


Cn") 


L1-SC2 


120.84 


3.81 


(3) 


0.12 




9.68 


(8) 


0.12 


CO") 


0.49 


CO") 


L1-SC3 


101.70 


10.31 


(10) 


0.11 


(0) 


29.85 


(29) 


0.11 


(0) 


0.42 


(0) 


L1-SC4 


118.72 


4.79 


(4) 


0.13 


(0) 


18.33 


(15) 


0.13 


(0) 


0.62 


(1) 


Med. Variab. 
























RYM-SCl 


163.44 


16.05 


(10) 


0.33 


(0) 


5.57 


(3) 


0.33 


(0) 


0.92 


(1) 


BYM-SC2 


35.37 


2.36 


(7) 


0.03 


(0) 


7.75 


(22) 


0.03 


(0) 


0.11 


(0) 


BYM-SC3 


66.60 


12.45 


(19) 


0.03 


(0) 


22.54 


(34) 


0.03 


(0) 


0.19 


(0) 


BYM-SC4 


75.85 


7.33 


(10) 


0.03 


(0) 


21.10 


(28) 


0.03 


(0) 


0.30 


(0) 


Ll-SCl 


169.40 


5.63 


(3) 


0.42 


(0) 


5.88 


(3) 


0.42 


(0) 


1.17 


(1) 


L1-SC2 


42.21 


0.71 


(2) 


0.05 


(0) 


12.85 


(30) 


0.05 


(0) 


0.15 


(0) 


L1-SC3 


78.77 


3.64 


(5) 


0.06 


(0) 


32.29 


(41) 


0.06 


(0) 


0.37 


(0) 


L1-SC4 


83.99 


2.25 


(3) 


0.07 


(0) 


22.24 


(26) 


0.07 


(0) 


0.40 


(0) 


High Variab. 
























BYM-SCl 


152.71 


10.83 


(7) 


0.42 


(0) 


9.59 


(6) 


0.42 


(0) 


1.42 


(1) 


BYM-SC2 


8.88 


0.51 


(6) 


0.05 


(1) 


4.00 


(45) 


0.05 


(1) 


0.11 


(1) 


BYM-SC3 


61.47 


8.96 


(15) 


0.03 


(0) 


21.32 


(35) 


0.03 


(0) 


0.16 


(0) 


BYM-SC4 


57.71 


3.74 


(6) 


0.02 


(0) 


15.73 


(27) 


0.02 


(0) 


0.18 


(0) 


Ll-SCl 


155.96 


4.01 


(3) 


0.39 


(0) 


10.28 


(7) 


0.39 


(0) 


1.70 


(1) 


L1-SC2 


10.15 


0.33 


(3) 


0.06 


(1) 


5.07 


(50) 


0.06 


(1) 


0.13 


(1) 


L1-SC3 


70.20 


3.15 


(4) 


0.08 


(0) 


29.69 


(42) 


0.08 


(0) 


0.39 


(1) 


L1-SC4 


62.13 


0.91 


(1) 


0.03 


(0) 


17.94 


(29) 


0.03 


(0) 


0.23 


(0) 



Entries for the posterior regrets have been truncated to the closest second digit after the decimal 
point, and entries for tlie percentage regrets have been truncated to the closest integer. For some 
entries, percentage regrets are smaller than 1 percentage point. 



lower under the CAR Normal model than under the CAR Laplace for all scenarios considered. In 
comparison to the TCL function, one can also observe that, as discussed in section 4.3.3, the overall 
size of the posterior and percentage regrets under the RCL function was found to be substantially 
lower than under the TCL function. This indicated that the choice of a particular plug-in classifier 
is more consequential under the TCL decision-theoretic framework than under the RCL function. 

4.4.5 Consequences of Scaling the Expected Counts 

Tables 4.6 and 4.7 on pages 86 and 87 document the posterior and percentage regrets of the 
different plug-in estimators of interest under the TCL and RCL functions, respectively, for two 
different scaling of the expected counts with SF € {0.1, 2.0}. Overall, the posterior expected losses 
under both the TCL and RCL functions were found to benefit from an increase of the level of 
the expected counts. The percentage regrets of the different plug-in classifiers, however, did not 
necessarily diminish with an increase in SF. 

Under the TCL function, our results showed that the SSEL classifier tended to do better for 
higher levels of expected counts. Although the differences in posterior and percentage regrets 
tended to be small since the SSEL phig-in estimator is close to optimal under the TCL function, 
a systematic trend is nonetheless notable, whereby the SSEL classifier showed consistent improve- 
ment for higher expected counts. Similarly, the MLE and CB plug-in classifiers benefited from an 
increase in SF. Although this tended to be also true for the triple-goal classifier, its performance in 
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Table 4.6. Posterior regrets based on TCL(C, 0, 8) with C := E[0] +sd[0] for five plug-in estimators and 
with the posterior expected loss of the optimal estimator in the first column. Results are presented for 3 
different levels of variability and for 4 spatial scenarios: an isolated cluster (SCI), a set of isolated clusters 
and isolated areas (SC2), highly structured spatial heterogeneity (SC3), and spatial structure generated 
by a hidden covariate (SC4); as well as two different scaling (SF) of the expected counts. Entries were 
scaled by a factor of 10"' with posterior regrets expressed as percentage of the posterior loss under the 
optimal estimator indicated in parentheses. 



Scenarios Posterior regrets'^ 





TCL 


MLE 






WRSEL 






SF = 0.1 Low Variab. 


















BYM-SCl 


128.3 


253.5 


(198) 


0.3 


(0) 


92.0 


(72) 


75.5(59) 


A ( CO\ 

D0.4(5zJ 


BYM-SC2 


11.0 


236.6 (2153) 


0.0 


(0) 


17.8 


(162) 


5.7(52) 


6.5(59) 


BYM-SC3 


119.4 


113.5 


(95) 


n A 
U.4 


0) 


84.9 


(71) 


/I C f A\ 

4.5 (4j 


6.3 (5) 


BYM-SC4 


53.9 


204.3 


(379) 


0.1 


(0) 


ocs.y 


[loo ) 


23.9(44) 


17.4(32) 


Ll-SCl 


109.5 


270.0 


(247) 


n 1 

U.i 


1,0) 


61.8 


(56) 


oU.o( (o) 


fil 1(KfC\ 
Di. l (OD ) 


L1-SC2 


7.2 


241.1 (3346) 


0.0 


(0) 


9.2 


(127 j 


6.7(93) 


6.2(86) 


L1-SC3 


116.0 


128.7 


/I 1 1 \ 

(111) 


1 A 
1.4 




42.4 


(37) 


Zo.o\Z\i ) 


i0.i(lo ) 


L1-SC4 


45.9 


zio.y 


( A'7'7\ 


0.1 


(0) 


64.6 


^1 /1 1 \ 

(^14ij 


on A( A A\ 
zU.4(44 ) 


23.3(51) 


SF = 0.1 Med. Variab. 


















BYM-SCl 


74.8 




(29ij 


0.5 


(1) 


80.4 


(108) 


24.0(32) 


23.4(31) 


BYM-SC2 


33.5 


1 1 A 1 




0.3 


/'-\\ 
(1) 


140.9 


(420) 


13.6(41) 


10.3(31) 


BYM-SC3 


49.2 


ft/I 7 




0.3 


(1) 


118.9 


(241) 


2.4 (5) 


6.2(13) 


BYM-SC4 


55.4 


129.7 


(234) 


0.3 


(0) 


1 O /I 

lzo.4 


[ZZO ) 


lD.4(ciU) 


15.8(29) 


Ll-SCl 


48.3 


252.5 


(523) 


U.Z 




48.8 


(101) 


Z0.1(0Z ) 


OQ O/'/l Q\ 

Zo.Z (4o ; 


L1-SC2 


35.1 


i ID.D 


yoo^ ) 


0.2 


(1) 


140. U 


(^4l0 ) 


19.4(55) 


13.6(39) 


L1-SC3 


67.3 


Dl.z 




l.i 




84.5 


(125) 


1 O c: /"l n\ 
iz.o(ly ) 


y.U(lo ) 


L1-SC4 


58.7 


1 Qn Q 


( OOQ\ 
\^ZZo ) 


U.i 




1 O/l 7 


[ZLo ) 


in Q('iA\ 
iy.c5(o4J 


OQ K( A(W 

zo.O(4U J 


SF = 0.1 High Variab. 


















BYM-SCl 


50.3 


110.6 


[220) 


0.1 


(0) 


101.5 


(202) 


2.2 (4) 


17.6(35) 


BYM-SC2 


30.0 


37.0 


(124) 


0.1 


(0) 


163.8 


(547) 


o.4(loj 


A "X ('\ A\ 

4.1(14 ) 


BYM-SC3 


65.2 


oo.y 


(98) 


0.4 


(1) 


106.5 


(163) 


Q cr /r:^ 
3.0 (5) 


4.2 (6) 


BYM-SC4 


51.1 


62.9 


(123) 


0.2 


(0) 


1 QQ Q 


{o (U) 


9.2(18) 


7.4(14) 


Ll-SCl 


83.5 


82.0 


(98) 


A O 

yj.z 


/'n^ 
(,Uj 


132.4 


(159) 


11 A{A A\ 


OO '7('Q^1^ 

zy. ( (oO ) 


L1-SC2 


29.7 


oo.y 




0.2 


(1) 


166.1 


(560) 


D.D(2z) 


5.2(18) 


L1-SC3 


83.2 


61.2 


(74) 


U.D 




92.2 


(111) 


n Q o\ 
y.o(iz ) 


n n/i o\ 

y.y(iz ) 


L1-SC4 


62.1 


57.2 


(92) 


0.3 


(1) 


161.7 


(260) 


1 O O /0 1 \ 

LZ.oyZi ) 


15.1(24) 


SF = 2 Low 


Variab. 


















BYM-SCl 


36.4 


66.1 


(182) 


0.0 


(0) 


78.3 


(215) 


2.0 (6) 


16.0(44) 


BYM-SC2 


54.3 


30.0 


(55) 


0.0 


(0) 


130.9 


(241) 


9.6(18) 


3.9 (7) 


BYM-SC3 


49.6 


13.3 


(27) 


0.1 


(0) 


151.8 


(306) 


0.7 (1) 


2.8 (6) 


BYM-SC4 


45.4 


12.2 


(27) 


0.0 


(0) 


1 CIO 
lyz.o 


(4z4 ) 


1.1 (2) 


1.7 (4) 


Ll-SCl 


46.5 


59.4 


(128) 


U.U 




86.1 


(185) 


1 Q ^ A\ 

1.8 (4) 


ly .o(4o ) 


L1-SC2 


56.0 


26.4 


(47) 


0.0 


(0) 


128.1 


(229) 


9.4(17) 


2.4 (4) 


L1-SC3 


55.7 


9.2 


(16) 


0.0 


(0) 


165.1 


(296) 


2.0 (4) 


3.2 (6) 


L1-SC4 


50.0 


9.5 


(19) 


0.0 


(0) 


201.2 


(402) 


1.5 (3) 


2.4 (5) 


SF = 2 Med. 


Variab. 


















BYM-SCl 


4.7 


8.8 


(186) 


0.0 


(0) 


2.5 


(53) 


0.0 (0) 


5.0(105) 


BYM-SC2 


48.1 


11.8 


(25) 


0.1 


(0) 


99.8 


(208) 


3.0 (6) 


2.1 (4) 


BYM-SC3 


30.6 


6.3 


(21) 


0.0 


(0) 


91.1 


(298) 


0.2 (1) 


0.3 (1) 


BYM-SC4 


28.0 


7.4 


(26) 


0.0 


(0) 


109.4 


(391) 


0.4 (2) 


1.7 (6) 


Ll-SCl 


9.3 


6.7 


(72) 


0.0 


(0) 


25.0 


(269) 


0.2 (3) 


4.4(48) 


L1-SC2 


49.4 


10.0 


(20) 


0.0 


(0) 


103.3 


(209) 


4.2 (8) 


2.2 (5) 


L1-SC3 


34.0 


2.8 


(8) 


0.0 


(0) 


96.2 


(283) 


0.5 (2) 


0.8 (2) 


L1-SC4 


29.8 


5.4 


(18) 


0.0 


(0) 


124.2 


(417) 


0.6 (2) 


1.4 (5) 


SF = 2 High Variab. 


















BYM-SCl 


0.1 


0.0 


(0) 


0.0 


(0) 


0.0 


(0) 


0.0 (0) 


0.0 (0) 


BYM-SC2 


21.2 


1.1 


(5) 


0.0 


(0) 


95.8 


(452) 


0.4 (2) 


4.4(21) 


BYM-SC3 


15.3 


4.1 


(27) 


0.0 


(0) 


16.5 


(108) 


0.2 (1) 


3.5(23) 


BYM-SC4 


23.2 


2.7 


(12) 


0.0 


(0) 


58.1 


(251) 


0.4 (2) 


0.8 (4) 


Ll-SCl 


0.3 


0.0 


(0) 


0.0 


(0) 


0.0 


(0) 


0.0 (0) 


0.0 (0) 


L1-SC2 


21.0 


1.6 


(7) 


0.0 


(0) 


100.2 


(478) 


0.4 (2) 


4.7(22) 


L1-SC3 


17.8 


1.8 


(10) 


0.0 


(0) 


21.3 


(119) 


0.3 (1) 


3.6(20) 


L1-SC4 


23.8 


2.1 


(9) 


0.0 


(0) 


66.6 


(279) 


0.7 (3) 


1.0 (4) 



Entries for the posterior regrets have been truncated to the closest first digit after the decimal point, 
and entries for the percentage regrets iiave been truncated to the closest integer. For some entries, 
percentage regrets are smaller than 1 percentage point. 
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Table 4.7. Posterior regrets based on RCL(7, 6, 5) with 7 = .80, for five plug-in estimators and with the 
posterior expected loss of the optimal estimator in the first column. Results are presented for 3 different 
levels of variability and for 4 spatial scenarios: an isolated cluster (SCI), a set of isolated clusters and 
isolated areas (SC2), highly structured spatial heterogeneity (SC3), and spatial structure generated by a 
hidden covariate (SC4); as well as two different scaling (SF) of the expected counts. Entries were scaled 
by a factor of 10'' with posterior regrets expressed as percentage of the posterior loss under the optimal 
estimator indicated in parentheses. 



Scenarios Posterior regrets"' 







MLE 






WRSEL 


OB 








at =0.1 Low 


Variab. 






















BYM-bCl 


277.38 


15.82 


(6) 


1.71 


(1) 


4.32 


(2) 


1.71 


(1) 


1.77 


(1) 


BYM-bC2 


280.36 


10.29 


(4) 


1.21 


(0) 


3.71 


(1) 


1.21 


(0) 


1.43 


(1) 


oYM-dL-o 


176.56 


64.78 


(37) 


0.30 


(0) 


23.14 


(13) 


0.30 


(0) 


0.83 


(0) 


TIA/'TV If C</~1 A 

BYM-bC4 


213.14 


Afi An 


{22) 


0.50 


(0) 


/111 

4.11 


(2j 


0.50 


(0) 


0.72 


(0) 






7.11 


(2) 


n QA 

U.o4 




3.79 


(1) 


U.o'i 


(c\\ 
V>) 


u.yz 


(Uj 


L1-SC2 


293.29 


2.99 


(1) 


0.84 


(0) 


2.87 


(1) 


0.84 


(0) 


0.83 


(0) 


L1-SC3 


248.41 


13.79 


(6) 


0.34 


(0) 


5.96 


(2) 


0.34 


(0) 


0.40 


(0) 


L1-SC4 


266.64 


7.98 


(3) 


0.44 


(0) 


2.27 


(1) 


0.44 


(0) 


0.54 


(0) 


SF = 0.1 Med. 


Variab. 






















BYM-SCl 


213.78 


48.11 


(23) 


0.83 


(0) 


4.26 


(2) 


0.83 


(0) 


1.05 


(0) 


BYM-SC2 


196.60 


18.45 


(9) 


0.24 


(0) 


5.29 


(3) 


0.24 


(0) 


0.55 


(0) 


BYM-SC3 


122.31 


60.99 


(50) 


0.13 


(0) 


22.57 


(18) 


0.13 


fr\\ 

(0) 


0.30 


(0) 


BYM-SC4 


191.33 


Q A Of? 

o4.yo 




0.17 


(0) 


A '7Ci 

4. /O 


(2) 


0.17 


(0) 


0.32 


(0) 


Ll-SCl 


280.59 


4.85 


(2) 






3.17 


(1) 


u. i y 


(UJ 


U. i 


(Oj 


L1-SC2 


218.70 


6.35 


(3) 


0.24 


(0) 


3.53 


(2) 


0.24 


/r\\ 

(0) 


0.45 


(0) 


L1-SC3 


208.18 


6.12 


(3) 


0.23 


(0) 


3.23 


(2) 


0.23 


(0) 


0.31 


(0) 


L1-SC4 


230.52 


6.08 


(3) 


0.48 


(0) 


2.55 


(1) 


0.48 


{(W 

(0) 


0.61 


(0) 


SF = 0.1 High 


Variab. 






















BYM-SCl 


182.54 


44.78 


(25) 


0.47 


(0) 


4.02 


(2) 


0.47 


(0) 


1.04 


(1) 


BYM-SC2 


117.85 


15.45 


(13) 


0.07 


(0) 


12.03 


(10) 


0.07 


(0) 


0.60 


(1) 


BYM-SC3 


112.30 


49.51 


f A A\ 

(44) 


0.14 


(0) 


15.31 


f-l A\ 

(14) 


0.14 


(0) 


0.42 


(0) 


BYM-SC4 


143.06 


o4.Uc5 


(nA\ 


0.09 


(0) 


1 no 


/'c^ 
\P) 


0.09 


(0) 


0.40 


(0) 


Ll-SCl 


207.60 


6.85 


/'Q^ 


n 1 
u.iy 


K") 


2.61 


(1) 


n 1 
U.iy 


(f\\ 

(Oj 


U.Dl 


(Oj 


L1-SC2 


135.69 


4.25 


(3) 


0.05 


(0) 


7.35 


(5) 


0.05 


(Oj 


0.34 


((\\ 
(Oj 


L1-SC3 


155.10 


23.98 


(15) 


0.19 


(0) 


5.43 


(4) 


0.19 


(r\\ 

(0) 


0.75 


(Oj 


L1-SC4 


179.98 


1.97 


(1) 


0.10 




3.94 


(2) 


0.10 


W 


0.36 


W 


SF = 2 Low Variab. 






















BYM-SCl 


176.52 


30.11 


(17) 


0.36 


(0) 


4.98 


(3) 


0.36 


(0) 


1.25 


(1) 


BYM-SC2 


66.23 


7.20 


(11) 


0.01 


(0) 


17.47 


(26) 


0.01 


(0) 


0.12 


(0) 


BYM-SC3 


82.39 


13.60 


(17) 


0.05 


(0) 


21.08 


(26) 


0.05 


(0) 


0.38 


(Oj 


BYM-SC4 


79.16 


0.00 


('7\ 


0.08 


(0) 


01 QO 

zi.oy 


(28) 


0.08 


(0) 


0.21 


(Oj 


Ll-SCl 


186.73 


16.80 


(9) 


U. / 




4.61 


(2) 


U. i D 


(OJ 


^ ^ A 

1.14 


(Ij 


L1-SC2 


73.88 


2.39 


(3) 


0.03 


(0) 


21.69 


(29) 


0.03 


(0) 


0.33 


(Oj 


L1-SC3 


90.87 


5.99 


(7) 


0.12 


(0) 


36.52 


(40) 


0.12 


(0) 


0.57 


(1) 


L1-SC4 


85.05 


1.00 


(1) 


0.04 


(0) 


23.86 


(28) 


0.04 


(0) 


0.31 


(0) 


SF = 2 Med. Variab. 






















BYM-SCl 


168.41 


19.20 


(11) 


0.51 


(0) 


7.57 


(4) 


0.51 


(0) 


1.17 


(1) 


BYM-SC2 


26.11 


2.37 


(9) 


0.31 


(1) 


14.22 


(54) 


0.31 


(1) 


0.40 


(2) 


BYM-SC3 


46.24 


8.51 


(18) 


0.01 


(0) 


19.38 


(42) 


0.01 


(0) 


0.01 


(Oj 


BYM-SC4 


54.57 


2.33 


(4) 


0.00 


(0) 


14.51 


(27) 


0.00 


(0) 


0.17 


(Oj 


Ll-SCl 


172.85 


7.70 


(4) 


0.45 


(0) 


9.23 


(5) 


0.45 


(0) 


1.67 


(1) 


L1-SC2 


25.80 


2.41 


(9) 


0.08 


(0) 


15.72 


(61) 


0.08 


(0) 


0.36 


(1) 


L1-SC3 


52.17 


3.37 


(6) 


0.04 


(0) 


23.93 


(46) 


0.04 


(0) 


0.11 


(0) 


L1-SC4 


58.02 


0.77 


(1) 


0.03 


(0) 


23.41 


(40) 


0.03 


(0) 


0.31 


(1) 


SF = 2 High Variab. 






















BYM-SCl 


163.70 


9.72 


(6) 


1.17 


(1) 


9.28 


(6) 


1.17 


(1) 


1.91 


(1) 


BYM-SC2 


19.77 


0.50 


(3) 


0.06 


(0) 


15.13 


(77) 


0.06 


(0) 


0.38 


(2) 


BYM-SC3 


41.49 


4.46 


(11) 


0.05 


(0) 


15.40 


(37) 


0.05 


(0) 


0.07 


(Oj 


BYM-SC4 


45.21 


2.15 


(5) 


0.00 


(0) 


15.45 


(34) 


0.00 


(0) 


0.00 


(Oj 


Ll-SCl 


167.87 


5.38 


(3) 


0.37 


(0) 


8.27 


(5) 


0.37 


(0) 


1.92 


(1) 


L1-SC2 


19.99 


0.50 


(2) 


0.18 


(1) 


15.29 


(76) 


0.18 


(1) 


0.35 


(2) 


L1-SC3 


47.61 


1.58 


(3) 


0.07 


(0) 


22.15 


(47) 


0.07 


(0) 


0.38 


(1) 


L1-SC4 


46.52 


0.45 


(1) 


0.05 


(0) 


17.94 


(39) 


0.05 


(0) 


0.21 


(0) 



^ Entries for the posterior regrets have been truncated to the closest second digit after the decimal 
point, and entries for the percentage regrets have been truncated to the closest integer. For some 
entries, percentage regrets are smaller than 1 percentage point. 
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Table 4.8. Number of hospitals above threshold for the MRSA data set with three different choices of 
threshold. The number of hospitals above threshold using the optimal classifier for the TCL function is 
reported in the first colunm. Classifications using plug-in estimators are reported as departures from the 
number of hospitals classified above threshold using the vector of posterior medians. For each plug-in 
estimator, the percentage departure has been indicated in parentheses. 



Thresliolds 


Optimal 

rcL 




Number of Hospitals aJjov 


i Threshold'' 




AILE 


SSEL 


WKSEL 


CE 


Gli 


C = 1/1.3 


56 


5 (9) 


-l(-2) 


19 (34) 


2 (4) 


3 (5) 


C = 1.0 


15 


2 (13) 


(0) 


10 (67) 


1 (7) 


(0) 


C= 1.3 


22 


11 (50) 


3 (14) 


21 (95) 


8 (36) 


6 (27) 



* Entries for the percentage departure have been truncated to the closest integer. For some entries, 
the percentage departure is smaller than 1 percentage point. 



terms of percentage regrets was sometimes worse with larger levels of expected counts. This can 
be observed by considering the percentage regrets in parentheses in the eleventh column of table 
4.6 on page 86, and comparing the first part of the table for which SF = 0.1 with the second part of 
the table for which SF = 2.0. A more confusing picture emerged for the WRSEL classifier. In this 
latter case, the type of spatial scenario considered and the level of heterogeneity of the simulated 
ensembles both played a substantial role in determining whether the WRSEL plug-in estimator 
benefited from an increase in SF. The ordering of the different plug-in estimators, however, was 
left unchanged by the use of different levels of expected counts. 

A similar trend can be observed for the RCL function in table 4.7 on page 87. Here, changes 
in SF did not, in general, modify the ordering of the plug-in estimators reported in section 4.4.4. 
The WRSEL classifier, however, was found to be detrimentally affected by an increase of the level 
of the expected counts under most spatial scenarios considered. This appeared to be true both in 
terms of posterior and percentage regrets. For the remaining plug-in estimators, the differences 
in percentage regrets associated an increase in SF was too small to allow the detection of any 
systematic trend, except perhaps for the MLE classifier, whose performance notably improved 
when specifying larger expected counts, albeit these improvements were restricted to the medium 
and high heterogeneity simulations. The different classification procedures considered in this 
chapter have also been applied to a real data example, which we describe in the next section. 

4.5 MRSA Prevalence in UK NHS Trusts 

The prevalence of MRSA in UK hospitals has been under scrutiny for the past 10 years, and 
surveillance data on MRSA prevalence has been made publicly available. The classification of 
National Health Services (NHS) trusts on the basis of MRSA risk may be of interest to medical 
practitioners and patients wishing to minimise their probability to contract the condition. Sev- 
eral statistical issues have been raised and discussed on the basis of this data set, including the 
evaluation of hospital performances over time (Spiegelhalter, 2005) , and the monitoring of over- 
dispersed MRSA counts (Grigg et al., 2009). In this section, we use this data set to illustrate the 
implementation of a decision-theoretic approach to the classification of elements in a parameter 
ensemble. Here, the NHS hospitals constitute the parameter ensemble of interest and our main 
objective is to identify which hospitals have a level of risk for MRSA above a given threshold. 

4.5.1 Data Pre-processing 

The full data set was retrieved from the archives of the UK's Department of Health 
(www. dh.gov. UK). This data set documents MRSA prevalence in all NHS trusts in the UK. 
Hospitals are classified in three categories: (i) general acute trusts, (ii) specialist trusts and (iii) 
single speciality trusts. For each hospital, the data set covers four distinct time periods. In this 
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Table 4.9. Posterior regrets based on TCL(C, 9, S) and RCL(7, 6, 5) for the MRSA data set with three 
different choices of thresholds in each case. The posterior expected loss of the optimal estimator is given in 
the first column. In parentheses, posterior regrets are expressed as percentage of the posterior loss under 
the optimal estimator. Entries have been all scaled by a factor of 10^. 



Loss functions & Thresholds Posterior regrets'^ 





Post. Loss 


MLE 


SSEL 


WRSEL 


CB 


GR 


TCL 














C = 1/1.3 


121.69 


0.80 (1) 


0.13(0) 


27.93 (23) 


0.16(0) 


0.33(0) 


C = 1.0 


125.14 


0.15 (0) 


0.15(0) 


14.78(12) 


0.15(0) 


0.15(0) 


C = 1.3 


65.36 


8.18 (13) 


0.15(0) 


45.62 (70) 


3.85 (6) 


3.77(6) 


RCL 














7 = .60 


131.58 


0.40 (0) 


0.40(0) 


0.32 (0) 


0.40 (0) 


0.40(0) 


7 = .75 


105.63 


1.43 (1) 


0.63(1) 


5.48 (5) 


0.63(1) 


1.55(1) 


7 = .90 


67.65 


4.14 (6) 


1.13 (2) 


0.86 (1) 


1.13(2) 


1.13(2) 



^ Entries for the posterior regrets have been truncated to the closest second digit after the decimal point, 
and the percentage regrets were truncated to the closest integer. For some entries, percentage regrets are 
smaller than 1 percentage point. 



thesis, we will focus on a subset of this longitudinal data, corresponding to the period running 

from April 2003 to March 2004, and consider how to classify NHS trusts according to the number 
of cases in that particular year. The Department of Health provided the prevalence of MRSA for 
each hospital and the number of bed days over that period. (Bed days are the number of inpatient 
hospital days per 1000 members of a health plan.) The NHS trusts with missing data or where no 
cases of MRSA was observed were eliminated, due to the impossibility of retrieving the number of 
bed days for these hospitals. That is, since the Department of Health only provided yearly num- 
bers of observed cases and the rates of MRSA per bed days, the number of bed days could not be 
computed for the hospitals with no observed cases, since the MRSA rates was zero for these trusts. 
Data from the following seven hospitals were therefore discarded: Birmingham Women's Hospital, 
Moorficlds Eye Hospital, Liverpool Women's Hospital, Medway Hospital, Royal National Hospital 
for Rheumatic Diseases, Sheffield Children's Hospital, and the Royal Orthopaedic Hospital. The 
final sample size was composed of 166 trusts with MRSA prevalence for the year 2003-2004. In 
previous statistical piiblications, this data set has been utilised in order to illustrate the inherent 
difficulties associated with the monitoring of changes in levels of risk over several time points. 
Here, by contrast, we are mainly concerned with the classification of hospitals according to their 
levels of risks, as opposed to a consideration of the evolution of trust-specific risk over several 
years. 

4.5.2 Fitted Model 

We represent observed cases of MRSA by yi for each hospital, with i = 1, . . . ,n, and n = 166. 
The expected counts for each NHS trust were computed using the MRSA prevalence per thousand 
bed days and the number of inpatient bed days in each trust. That is, 

Ei := PMRSA X BDi, (4.37) 

where pmrsa ■= 12^=1 Vi / 'l2^=i^^i population MRSA prevalence, and we have assumed 

this rate to be constant across all UK hospitals. The BD^'s denote the number of bed days in the 
i*^ NHS trust in thousands. 

A two-level hierarchical model with Poisson likelihood and a lognormal exchangeable prior on 
the RRs was used to fit the data. This assumes that the counts are Poisson, but with potential 
over-dispersion, due to clustering of cases caused by the infectiousness of the disease or other 
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unmeasured risk factors. The full model had therefore the following structure, 



yi ~ Po\s{9iE^) i = l,...,n, 
log ei = a + Vi (4.38) 

where the inverse of the variance parameter was given a 'diffuse' gamma distribution, ~ 
Gam(0. 5, 0.0005), while a flat Normal distribution was specified for the intercept, a ~ 
iV(0.0, IQ-*'). For each hospital, wc have RR,; := Oj. Since the joint posterior distribution of 
the ^i's is not available in closed-form, an MCMC algorithm was used to estimate this model. The 
MRSA data was fitted using WinBUGS 1.4 (Lunn et al., 2000). The codes used for the estimation 
of the model have been reproduced in Appendix C. 

Five difl'erent ensembles of point estimates were derived from the joint posterior distribution of 
the 6i's, including the MLEs, the posterior means and medians, and the WRSEL, CB and triple- 
goal point estimates. Given the size of the parameter ensemble in this data set, the vector of weights 
in the WRSEL function was specified using ai = a2 = 0.1. (The sensitivity of the performance 
of the WRSEL plug-in estimators to the choice of oi and a2 will be discussed in chapter 5). We 
considered the performance of these plug-in estimators under the TCL and RCL functions with 
three different choices of thresholds in each case. Here, we selected C G {0.77, 1.0, 1.3}, where the 
first and third thresholds are equidistant from 1.0 on the logarithmic scale -that is, 1/1.3 = 0.77. 
For C = 1/1.3, we evaluated the number of hospitals classified below threshold. That is, we were 
interested in identifying the NHS trusts characterised by a substantially lower level of risk for 
MRSA. A choice of a threshold of 1.3, in this study, implies that hospitals above this threshold 
have an MRSA rate, which is 30% higher than the national average. A threshold lower than 1.0 
(i.e. C = 1/1.3) was also selected in order to estimate which hospitals can be confidently classified 
as having substantially lower MRSA rates than the remaining trusts. Note that in that case, the 
definitions of the false positives and false negatives in equations (4.1) and (4.2) become inverted. 
That is, for C < 1.0, we now have 

FP{C,0,6):=X{e>C,S<C}, (4.39) 

and 

FN{C,0,S)~I{e<C,d>C}. (4.40) 

However, since we will solely be interested in unweighted classification loss, in this section, this 
inversion does not aff'ect the computation of the posterior and percentage regrets of the plug-in 
estimators. The results are presented for the TCL and RCL functions, in turn. 

4.5.3 TCL Classification 

In table 4.8 on page 88, we have reported the departure of each plug-in classifier from the optimal 
TCL estimator in terms of number of NHS hospitals classified above -or below, if C = 1/1.3- 
threshold. Remarkably, for the three thresholds (C G {1/1.3, 1.0, 1-3}), almost all plug-in classifiers 
were found to yield a greater number of NHS trusts classified above (respectively below) threshold 
than when using the set of posterior medians. The sole exception to this trend was for the 
SSEL-bascd classifiers, which were modestly more conservative than the posterior medians under 
C = 1/1.3. That is, the ensemble of posterior means produced a smaller set of hospitals considered 
to have RRs lower than 1/1.3. This indicates therefore that the optimal TCL classifier generally 
tends to produce more conservative classifications than its plug-in counterparts. 

Moreover, we also note that, for this particular data set, the GR-based classification was found 
to be closer to the optimal classification than the one based on the CB point estimates. These 
results should be contrasted with OTir spatial and non-spatial data simulations, which have shown 
that the CB and GR plug-in estimators tended to behave similarly under the TCL function. Not 
surprisingly, however, the ensemble of posterior means was found to provide the smallest number 
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(a) MLE 



(b) Posterior Means 




RR estimates RR estimates 



(c) Posterior Medians (d) WRSEL 
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RR estimates RR estimates 




Figure 4.3. Ensembles of point estimates of MRSA RRs under different loss functions for 166 NHS trusts 

during the 2003-2004 period. The panels correspond to the (a) MLEs, (b) posterior means, (c) posterior 
medians, (d) point estimates under WRSEL, (e) constrained Bayes point estimates and (f) triple-goal 
point estimates. Classification of these point estimates is conducted with respect to a threshold taken to 
be C = 1.3 (dashed line). A smoothed version of the histograms has also been superimposed. 
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(a) MLE-based Classification 
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Figure 4.4. Classification of individual 'general acute' NHS trusts (n — 110) during the year 2003- 
2004, on the basis of three different families of point estimates: (a) MLEs, (b) posterior means, and (c) 
posterior medians. The marginal posterior distributions of trust-specific RRs for MRSA are represented 
by box plots (Median, ±sd,±2sd). In each panel, the trusts classified above threshold, C — 1.3 (dashed 
line), are indicated in red. 
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(a) WRSEL Classification 
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Figure 4.5. Classification of individual 'general acute' NHS trusts (n — 110) during the year 2003- 
2004, on the basis of three different families of point estimates: (a) WRSEL, (b) CB and (c) triple-goal 
estimates. The marginal posterior distributions of trust-specific RRs for MRSA are represented by box 
plots (Median, ±sd,±2sd). In each panel, the trusts classified above threshold, C = 1.3 (dashed line), are 
indicated in red. Note the particular poor performance of the WRSEL plug-in classifier in this case. 
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(a) MLE-based Classification 
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(b) Posterior Means Classification 




~r 



100 



200 



300 



400 500 
Bed Days (1000s) 



600 



700 



800 



900 



(c) Posterior Medians Classification 
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Figure 4.6. Classification of individual 'specialist' NHS trusts (n = 43) during the year 2003-2004, on 
the basis of three different families of point estimates: (a) MLEs, (b) posterior means, and (c) posterior 
medians. The marginal posterior distributions of trust-specific RRs are represented by box plots (Median, 
±sd,±2sd). In each panel, the trusts classified above threshold, C = 1.3 (dashed line), are indicated in 
red. 
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(a) WRSEL Classification 
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Figure 4.7. Classification of individual 'specialist' NHS trusts (n = 43) during the year 2003-2004, on the 
basis of three different families of point estimates: (a) WRSEL, (b) CB and (c) triple-goal estimates. The 
marginal posterior distributions of trust-specific RRs are represented by box plots (Median, ±sd,±2sd). 
In each panel, the trusts classified above threshold, C = 1.3 (dashed line), are indicated in red. 
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of misclassified areas in comparison to the optimal classifier. For C = 1.0, one can observe that 
both the SSEL and GR plug-in classifiers and the posterior medians yielded an identical number 
of NHS trusts above threshold. 

In addition, the performance of the different sets of plug-in estimators in terms of posterior 
regret is also reported in table 4.9 on page 89 for the TCL and RCL functions. As aforementioned, 
one can see that the smallest posterior regret is achieved by the SSEL classifier across all three 
thresholds considered. In line with the results of our synthetic simulations, the triple-goal and CB 
plug-in estimators were found to exhibit the second best performance. Both of these classifiers 
tended to behave similarly under both the TCL and RCL functions. These plug-in estimators were 
followed by the MLE and WRSEL classifiers in increasing order of posterior regret. Figure 4.3 
on page 91 shows the empirical ensemble distributions of point estimates for the six ensembles of 
point estimates that we have considered. In that figure, the RRs for MRSA for each NHS trust are 
classified above and below C = 1.3. The pathological behaviour of the WRSEL point estimates 
first discussed in section 3.3.5 is here particularly visible in panel (d), where the bimodality of the 
WRSEL ensemble of point estimates can be seen to emerge. The classification of NHS trusts using 
the six different types of classifiers can also be examined in greater detail in figures 4.4 and 4.5 
on pages 92 and 93, which portray the classifications of the general acute NHS trusts. Similarly, 
the classification of specialist NHS trusts under these different ensembles of point estimates can 
be compared in figures 4.6 and 4.7 on pages 94 and 95, respectively. These box plots provide a 
summary of the hospital-specific marginal posterior distributions of the level of risk for MRSA. 
Specifically, one can see that for general NHS trusts, the set of hospitals classified above an RR 
of 1.3 is lower when we use the set of posterior medians to classify them. This contrasts with 
the adoption of the ensemble of MLEs and posterior means or other sets of point estimates for 
the same task. A similar pattern is visible for the specialist NHS trusts in figures 4.6 and 4.7, 
albeit to a lesser degree, since both the posterior means and posterior medians produce identical 
classifications for this particular class of hospitals. 

4.5.4 RCL Classification 

Results on the use of plug-in estimators under the RCL function, in this data set, are reported in 

table 4.9 on page 89. The different plug-in classifiers can here be compared in terms of posterior 
and percentage regrets. We have computed the performance of these plug-in estimators using 
three different proportions, 7 e {.60, .75, .90}. As pointed out in section 4.3.3, the SSEL and 
CB classifications under RCL are necessarily identical. In addition, the performance of these 
two classifiers was found to be very close to the one of the GR plug-in estimator. These three 
plug-in estimators outperformed the MLE and WRSEL classifiers, as expected from our spatial 
and non-spatial simulation results. As previously mentioned, it is important to note that the 
differences in posterior regrets between different choices of plug-in classifiers were found to be 
small in comparison to the differences in posterior regrets between different choices of classifiers 
under the TCL function, thereby indicating that the choice of plug-in estimator is more important 
under the latter function than when considering rank-based classification. 

4.6 Conclusion 

In this chapter, we have investigated a standard classification question, which often arises in both 
epidemiology and spatial epidemiology. This problem centers on the allocation of a subset of 
elements in a parameter ensemble of interest to an elevated-risk cluster according to the estimated 
RRs for each of the elements in that ensemble. We have showed that such a problem can be 
reformulated in a decision-theoretic framework, where standard machinery can be implemented 
to optimise the solution of the problem. Doing so, we showed that the posterior expected TCL 
function is minimised by the ensemble of posterior medians. In addition, we have also considered 
the RCL function, whose properties and minimiser have already been documented by Lin et al. 
(2006). 
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As in chapter 3, our main focus in this chapter has been on evaluating tlic performance of 
various plug-in estimators under both the TCL and RCL functions. Overall, the ensemble of pos- 
terior means was found to perform close to optimality under both decision-theoretic paradigms, 
thus indicating that this standard choice of point estimates could constitute a good candidate for 
routine epidemiological investigations. In particular, the posterior means have the non-negligible 
advantage of familiarity, which may aid the reporting of such point estimates in public health 
settings. The good performance of the SSEL point estimates under the TCL function can be 
explained by noting that, in most applications, the marginal posterior distributions of the param- 
eters of interest will be asymptotically normally distributed. In such cases, as we gather more 
information about each parameter in the ensemble, the set of posterior means converges towards 
the set of posterior medians. 

In section 4.4.3, we noted that the ensemble of SSEL point estimates also performed well 
under weighted TCL. In that section, wc reported the results of a set of simulations based on 
TCLo.8, which gives a greater penalty to false positives. However, in a separate set of simulations 
(results not shown), we also evaluated the performance of the different plug-in estimators under 
TCLo.2. which gives a greater penalty to false negatives. In that case, we have found that the 
set of posterior means was outperformed by other plug-in classifiers under several experimental 
conditions. These discrepancies between different weighted TCLs can be explained in terms of 
hierarchical shrinkage. The posterior means tend to be shrunk towards the prior mean, and will 
therefore constitute a naturally conservative choice that is likely to produce less false positives, 
but a greater amount of false negatives. 

The GR and CB classifiers produced good performance on the TCL and RCL functions, as 
measured by the posterior and percentage regrets. These plug-in estimators were followed by 
the MLE and WRSEL classifiers. However, the performance differences between the candidate 
estimators under the TCL decision-theoretic paradigm were much larger than the differences in 
posterior regrets between the plug-in estimators under the RCL function. These discrepancies may 
be explained in terms of the substantial differences between these two loss functions. As described 
in section 4.2.2 on page 74, the optimisation of the RCL function only requires an adequate ordering 
of the elements in the ensemble under scrutiny. For the TCL function, however, minimisation 
necessitates not only a good ordering of the elements in the ensemble, but also a reasonably 
precise estimation of the values of each of these elements. It appears that the combination of 
these two desiderata makes the task of optimising the posterior TCL more difficult than the one 
of minimising the RCL one, thereby yielding greater discrepancies between the different candidate 
plug-in estimators. 

In spatial epidemiology, it is often of interest to consider weighted classification loss functions, 
which may, for instance, privilege true positives over false positives, or the converse. This is an 
issue, which has been addressed by Richardson et al. (2004), who discussed different re- weighting 
of the false positive and false negative rates. They used a numerical minimisation procedure 
based on synthetic data sets, which showed that an area should be classified above a threshold 
if approximately 80% of the mass of the posterior distribution of the level of risk in that area is 
above the threshold of interest (for the BYM and BYM-Ll models). In section 4.4.3, we have 
seen that this choice of decision rule is equivalent to the specification of a weighted TCLp with 
p = .80. Thus, the decision rule proposed by Richardson et al. (2004) gives a greater penalty 
to false positives. This represents a natural choice of classification framework in the context of 
spatial epidemiology, which deprecates the number of potential false alarms. The adoption of a 
conservative approach to the identification of institutions or geographical areas as characterised 
by "elevated risk" is amply justified by the sensitive nature of public health studies and their 
extensive media coverage. 

In spite of our emphasis on the use of suboptimal classifiers, wc also note that our analysis of the 
MRSA data set has shown that the use of the set of posterior medians yield the most conservative 
classification of the NHS trusts. In an area as sensitive as the level of risk for contracting MRSA, 
one may prefer to err on the side of caution by classifying as few hospitals as possible as part 
of an elevated-risk cluster. From a public health perspective, Grigg et al. (2009) among others 
have argued in favour of the utilisation of conservative communication strategies when reporting 
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surveillance data. When communicating such epidemiological findings to a general audience, the 
use of the optimal estimator under the TCL function may therefore be usually preferred in order 
to attenuate the potential detrimental effects of too liberal a classification. 
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4.7 Proof of TCL Minimisation 

Proof of proposition 1 on page 71. 

Let Pp{C, 0, 0'^^^) denote E[TCLp(C, 6, d''^^)\y]. We prove the result by exhaustion over three cases. 
In order to prove that 

PpiC e, < ppic e, 0-*), (4.41) 

for any 6'<^^* e with 6'f "^^ := Qe,\y{l-p), it suffices to show that pp{C, 9^, 9^^'^^) < pp{C, 9^,9f^) 
holds, for every i = 1, . . . , n. Expanding these unit-specific risks, 

pX{9t''~' > C}P [9, < C\y] + (1 - p)I{9t'^ < C}P [9, > C\y] 
< pl{9r' > C}F [9i < C\y] + (1 - p)I{^r < C}P [9i > C\y] . 

Now, fix C and p G [0, 1] to arbitrary values. Then, for any point estimate 9°^^, we have 



pm<c\y], if 0r>c, 

{1 - p)P[9, > C\y], ii9r'<C. 



pac, eu or) = { ^: ^^^tjj : ^ (4-43) 



The optimality of 9^^ over as a point estimate is therefore directly dependent on the 
relationships between and C, and between and C. This determines the following three 

cases: 

i. If 9^j^~^^ and ^f''* are on the same side of C, then clearly, 

Pp{C, 9i, -^)) = Pp{C, 9,, 9r'), (4.44) 

ii. If 6*^ < C and 6'f * > C, then, 

PpiC 9i, -^)) = (1 - pn9i > C\y] < p¥[9, < C\y] = p^iC 9^, ^f), (4.45) 



ill. If 6*^ "^^ > C and 6if ^ < C, then. 



Pp{C, 9i, -^)) = pF[9i < C\y] < (1 - p)F[9i > C\y] = Pp{C, 9^, 9f), (4.46) 

Equation (4.44) follows directly from an application of the result in (4.42), and cases two and 
three follow from consideration of the following relationship: 



9i<C\y]^{l-p)¥[9i>C\y], (4.47) 
where ^ means either <, = or >. Using ¥[9i > C\y] = 1 — F[9i < C\y], this gives 

n9i<C\y]=Fe,\y{C)^l-p. (4.48) 
Here, Fg.^y is the posterior CDF of 9i. Therefore, we have 

C I F-^lil -p) =: Qo,\yil-p) ■■=■■ 9i'-^\ (4.49) 

where ^ takes the same value in equations (4.47), (4.48) and (4.49). 

This proves the optimality of d^'^~P\ Moreover, since one can construct a vector of point 
estimates ^|^* satisfying 6f^^ ^ C, whenever ^-^"^^ ^ C, for every i, it then follows that is 
not unique. 
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Chapter 5 

Discussion 



In this final chapter, we review the main findings of the thesis, and explore possible extensions of 
this work in several directions. In particular, we consider whether the loss functions used in this 
thesis can be generalised in order to take into account more sophisticated inferential requirements. 
Moreover, we also discuss how different modelling assumptions may better serve the decision- 
theoretic questions addressed in the thesis. 

5.1 Main Findings 

In the present thesis, we have adopted a formal decision-theoretic approach to two inferential 
issues that naturally arise in epidemiology and spatial epidemiology. Firstly, we have considered 
how the estimation of the heterogeneity of a parameter ensemble can be optimised. Secondly, we 
have derived the optimal estimator for the classification of the elements of a parameter ensemble 
above or below a given threshold. For consistency, epidemiologists are generally under pressure to 
report a single set of point estimates when presenting their research findings. We have therefore 
also explored the utilisation of various plug-in estimators based on more commonly used sets of 
point estimates in order to identify the ensembles of point estimates that may simultaneously 
satisfy several of these inferential objectives. 

Overall, our experimentation and real data analyses have shown that the GR plug-in estimator 
is very close to optimality when evaluating the dispersion of a parameter ensemble, as quantified by 
the posterior QR. By contrast, the best quasi-optimal plug-in estimator under both the TCL and 
RCL functions was found to be the one based on the SSEL point estimates. Taken together, these 
results confirm the inherent difficulties associated with attempting to satisfy several inferential 
desiderata using a single set of point estimates. Ultimately, the two objectives considered in this 
thesis have been found to be optimised by two different sets of point estimates. Nonetheless, 
further research could investigate the formulation of nested decision-theoretic procedures in the 
spirit of the triple-goal methods introduced by Shen and Louis (1998), which could sub-optimally 
satisfy these two goals using a single set of point estimates. Specifically, one could construct a 
decision-theoretic framework where the two inferential objectives are weighted, hence allowing 
different modellers to express their preference for one goal over another in a formal manner. 

Another of the consistent findings in this thesis has been the relatively poor performance of 
the WRSEL introduced by Wright et al. (2003). In most of the spatial and non-spatial synthetic 
simulations considered, this plug-in estimator tended to perform on a par with the MLE plug-in 
estimator. The main objective of the WRSEL function is to counteract the effect of hierarchical 
shrinkage in the models studied. Point estimates based on WRSEL have proved to be useful in 
improving the estimation of the 'spread' of parameter ensembles. However, the WRSEL estimation 
procedure was found to lack portability from one data set to another. Specifically, the main 
difficulty with this approach resides in the necessary pre-specification of the set of weights that 
recalibrate the loss function. We have seen that these (/ij's are determined by both the number of 
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elements in the ensemble and the values taken by two tuning parameters, ai and 02, which control 
the symmetry and the magnitude of the re-weighting effected by the WRSEL function. A step 
towards the automatisation of the use of the WRSEL for any data set may, for instance, include 
the specification of both oi and 02 as functions of the size of the ensemble. 

Our work on classification loss was originally motivated by the paper of Richardson et al. 
(2004) on the identification of geographical areas with "elevated risk" . Our exploration of various 
classification loss functions in chapter 4 has demonstrated that the decision rule proposed by 
Richardson et al. (2004) is equivalent to the utilisation of a weighted TCL function based on 
p = .80. This particular choice of decision rule can be shown to give a greater penalty to false 
positives than to false negatives, thereby safeguarding the public and decision makers against 
potential false alarms. 



5.2 Generalised Classification Losses 

In chapter 4, we have described and used a classification loss function based on the ranking of the 
elements in a parameter ensemble. The RCL function was originally introduced by Lin et al. (2006) 
under a variety of guises. These authors suggested the use of several generalised RCL functions 
that combined penalties for misclassifications with explicit penalties for the distance of the point 
estimates from the threshold of interest. Their decision-theoretic framework was developed on 
the basis of rank classification but could easily be adapted to the problem of classifying elements 
in a parameter ensemble with respect to a threshold on the natural scale of these elements. The 
basic principles underlying the TCL function introduced in this thesis, could therefore be extended 
following the lead of Lin et al. (2006), in the following three directions. Here, each loss function 
takes three parameters, p,q,b> 0, 

41 (p, q, b):=^J2\^- FP(C, 9., ef) + b\C - FN(C, 6,, 9f), 

i=l 
1 " 

4(p,g,6) :=-Y,\C-eifFP{CA,Or) + b\C-ei\'iFN{C,ei,er), (5.1) 

1=1 

Ll^ip, q, b):=-J2 - Sil" FP(C, ^i, *) + - Oi]" FN(C, 6^ ^f*); 

where FP and FN are defined as in equations (4.1) and (4.2), respectively, on page 70. As for the 
TCL, none of these families of loss functions produce a penalty if the true parameter 9i and the 
point estimate 9f^*^ are both above or below the cut-off point C. (Note, however, that this is not 
true for the expected TCL, where integration over the parameter space will necessarily yield a loss 
greater than zero, regardless of the correctness of the classification). If 6i and 9^^^ are not on the 
same side of C, Z/^ penalises the estimation procedure by an amount which is proportional to the 
distance of 9^^^ from C; L^^ penalises by an amount which is proportional to the distance of 9i 
from C; and penalises by an amount which is proportional to the distance between 9^^^ and 
9i . The parameters p and q permits to vary the strength of the penalties associated with the false 
positives and the false negatives, respectively. Moreover, a final parameter, b, allows to further 
adjust these two types of penalties by directly re-weighting their relative importance. Although 
lI-, and are of theoretical interest, it should be clear that these particular loss families would 
be difficult to implement, in practice, because they rely on the unknown quantities, 0,;'s. 

The three generalisations of the TCL family in equation (5.1) will require specific optimisa- 
tion procedures as was demonstrated by Lin et al. (2006), who considered similar generalisations 
with respect to rank-based classification. Since the minimisation of the expected TCL and RCL 
functions were found to differ substantially, it is not expected that the work of Lin et al. (2006) 
on generalised RCLs will necessarily be relevant to the minimisation of the generalised TCLs de- 
scribed in equation (5.1). Further research will therefore be needed to explore these issues and 
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investigate the behaviour of various plug-in estimators under these generalised versions of the TCL 
function. 

5.3 Modelling Assumptions 

Since the choice of a particular loss function only affects the post-processing of the joint posterior 
distribution of the parameter ensemble of interest, such methods are only as good as the statistical 
model on which they are based. The use of different models may therefore lead to substantial 
improvements in either estimating the true heterogeneity of the ensemble or in classifying the 
elements of that ensemble. This is a concern, which has already been voiced by several authors. 
Shen and Louis (1999), for instance, have noted the dependence of their triple-goal estimator's 
performance on the quality of the ensemble prior distribution specified for the parameters of 
interest. It is this type of concern that led these authors to propose the Smoothing by Roughening 
(SBR) algorithm in order to produce an Empirical Bayes (EE) prior distribution for the ^j's 
(Shen and Louis, 1999). For spatial models, the critical influence of modelling assumptions on the 
validity of the subsequent estimation of the level of heterogeneity in a parameter ensemble has 
been highlighted by Lawson (2009) and Ugarte et al. (2009b). 

In chapter 2, we have classified the models studied throughout the thesis in three categories 
depending on the choice of second-level priors. This included (i) proper conjugate iid priors, (ii) 
non-conjugate proper iid priors and (iii) non-conjugate improper non-idd priors on the 9i's. It may 
be of interest to consider non-parametric extensions of these models, such as ones based on the 
Dirichlet process prior (MacEachern, 1994, MacEachern and Muller, 1998). The specification of 
such priors in BHMs tends to give more flexibility to the model. In our case, this may particularly 
help to evaluate the shape of the ensemble distribution with greater accuracy, which could yield 
better estimates of the heterogeneity of the ensemble. The combination of a parametric likelihood 
function with a non-parametric hierarchical prior is often referred to as a semi-parametric model. 
When combined with the Q-SEL or QR-SEL functions, such semi-parametric Bayesian models 
may be particularly well-suited to the type of inferential problems encountered in epidemiology, 
where the estimation of the properties of the empirical distributions of parameter ensembles is 
especially important. 

In terms of the classification of the elements of an ensemble, Ugarte et al. (2009a) have com- 
pared the performance of a HBM with an EB strategy, and have concluded that the full Bayesian 
framework may be preferable for the classification of elements in a parameter ensemble. In ad- 
dition, a natural modelling approach to this problem may be the use of mixture models. One 
of the important limitations of the procedure developed in chapter 4 is that we are imposing 
the number of categories and fixing a particular cut-off point. When one is simply interested in 
identifying the elements of an ensemble, which are characterised by a level of risk greater than a 
certain threshold, this classification procedure could be sufficient. However, if one is interested in 
the number of categories per se, a more sophisticated modelling strategy may be adopted, where 
inference about the number of categories can be directly conducted. Richardson et al. (2004), for 
instance, have considercxi the use of the reversible-jump ^IC^IC algorithm in conjunction with a 
decision-theoretic framework in order to classify the elements of an ensemble (see also Green, 1995, 
Richardson and Green, 1997). As explained in the introduction of chapter 4, Richardson et al. 
(2004) have calibrated their choice of classification thresholds differently, when utilising different 
models. While we have shown that the use of the posterior median is optimal under the posterior 
TCL function, it may nonetheless be of interest to investigate how specific modelling outputs, such 
as the distribution of the number of mixture components in a semi-parametric mixture models 
could be used in the context of classifying the elements of an ensemble in terms of levels of risk. 
More specifically, the posterior distribution of the number of mixture components could aid with 
the determination of the ideal cut-off value upon which the classification exercise is conducted. 
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Appendix A 

Non-Spatial Simulations 



In this appendix, we provide some descriptive statistics of the non-spatial simulated data sets used 
in chapters 3 and 4. 
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Table A.l. Descriptive statistics for the simulated j/i's. Presented for the Normal-Normal model 
in equation (2.20) and the Gammma-Inversc gamma model in equation (2.21), and for 3 different levels 
of RLS (ratio of the largest to the smallest Uj), and 3 different values for n, averaged over 100 synthetic 
data sets. 



Models & Shrinkage Descriptive Statistics 

Mean SD 2.5'" Qua. Median 97.5'" Qua. 

RLS = 1 

N-N, n = 100 -0.002 

N-N, n = 200 0.001 

N-N, n = 1000 0.004 

G-IG, n = 100 1.000 

G-IG, n = 200 0.992 

G-IG, n = 1000 0.997 

RLS = 20 

N-N, n = 100 0.014 

N-N, n = 200 0.022 

N-N, n = 1000 0.005 

G-IG, n = 100 1.408 

G-IG, n = 200 1.413 

G-IG, n = 1000 1.412 

RLS = 100 



N-N, n = 100 


0.023 


1.786 


-3.419 


-0.001 


3.647 


N-N, n = 200 


-0.003 


1.767 


-3.562 


-0.010 


3.563 


N-N, n = 1000 


0.006 


1.773 


-3.609 


0.002 


3.635 


G-IG, n = 100 


2.191 


3.807 


0.000 


0.733 


11.791 


G-IG, n = 200 


2.101 


3.690 


0.000 


0.712 


11.589 


G-IG, n = 1000 


2.130 


3.749 


0.000 


0.704 


11.885 



1.414 
1.410 
1.419 
1.253 

1.325 
1.396 



-2.682 
-2.709 
-2.766 
0.027 
0.024 
0.020 



-0.006 
-0.001 
0.006 
0.588 
0.575 
0.572 



2.641 
2.707 
2.775 
4.252 

4.308 
4.478 



1.569 
1.554 

1.558 
2.099 
2.190 
2.235 



-2.984 
-2.998 
-3.101 
0.001 
0.001 
0.000 



0.016 
0.026 
0.007 
0.673 
0.667 
0.659 



3.014 
3.053 

3.090 
6.688 
6.955 
7.071 



Table A. 2. Descriptive statistics for the simulated (Ti's. Presented for the Normal-Normal model 

in equation (2.20) and the Gammma-Invcrse gamma model in equation (2.21), and for 3 different levels 
of RLS (ratio of the largest to the smallest ai), and 3 different values for n, averaged over 100 synthetic 
data sets. 



Models & Shrinkage 






Descriptive Statistics 








Mean 


SD 


2.5'" Qua. 


Median 


97.5'" Qua. 


RLS = 1 












n= 100 


1.000 


0.006 


0.991 


1.000 


1.009 


n = 200 


1.000 


0.006 


0.991 


1.000 


1.009 


n = 1000 


1.000 


0.006 


0.991 


1.000 


1.010 


RLS = 20 












n= 100 


1.398 


1.135 


0.246 


0.993 


4.027 


n = 200 


1.426 


1.152 


0.244 


1.017 


4.125 


n = 1000 


1.415 


1.147 


0.241 


1.001 


4.147 


RLS = 100 












n= 100 


2.158 


2.503 


0.117 


1.022 


8.627 


n = 200 


2.119 


2.464 


0.115 


1.006 


8.663 


n = 1000 


2.139 


2.487 


0.113 


0.999 


8.839 
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Appendix B 

Spatial Simulations 



In this appendix, we provide some descriptive statistics of the spatial simulated data sets used in 
chapters 3 and 4. 



105 



Table B.l. Descriptive statistics for the i?i's used in the synthetic data simulations. These 

are here presented with respect to three different values of the scaling factor (SF) , controlling the level of 
the expected counts. The Ei's reported here correspond to the expected counts for lung cancer in West 
Sussex adjusted for age only, occurring among males between 1989 and 2003. These data points have been 
extracted from the Thames Cancer Registry (TCR). 

Scaling Factor Descriptive Statistics 

Mean SD 2.5"^ Qua. Median 97.5'-'^ Qua. 



SF = 0.1 17 9 5 15 39 

SF = 1 170 88 52 149 389 

SF = 2 340 175 105 298 778 



Table B.2. Descriptive statistics for the Relative Risks (RRs) in the synthetic spatial simula- 
tions. Presented under three different levels of variability (low, medium and high), where the parameters 
controlling the levels of variability are modified according to the spatial scenario (i.e. SCI, SC2, SC3 and 
SC4) considered. Note that while the RRs were simulated from specific statistical models in SC3 and SC4, 
they were set to specific values in SCI and SC2. 



Scenarios & Variabilitt 






Descriptive Statistics 








Mean 


SD 


2.5"" Qua. 


Median 


97.5*" Qua. 


Low Variab. 












SCI {LR = 1.5) 


1.018 


0.093 


1.000 


1.000 


1.500 


SC2 {LR = 1.5) 


1.107 


0.206 


1.000 


1.000 


1.500 


SC3 (a = 0.1) 


1.027 


0.283 


0.604 


0.992 


1.622 


SC4 Ip = 0.2) 


1.025 


0.243 


0.696 


0.971 


1.590 


Medium Variab. 












SCI {LR = 2) 


1.036 


0.187 


1.000 


1.000 


2.000 


SC2 {LR = 2) 


1.215 


0.412 


1.000 


1.000 


2.000 


SC3 {a = 0.2) 


1.194 


0.513 


0.535 


1.077 


2.408 


SC4 (/? = 0.3) 


1.055 


0.367 


0.612 


0.954 


1.938 


High Variab. 












SCI {LR = 3) 


1.072 


0.373 


1.000 


1.000 


3.000 


SC2 {LR = 3) 


1.430 


0.824 


1.000 


1.000 


3.000 


SC3 {a = 0.3) 


1.193 


0.598 


0.465 


1.049 


2.641 


SC4 {13 = 0.4) 


1.095 


0.515 


0.531 


0.935 


2.358 
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Table B.3. Descriptive statistics for the simulated j/i's. Presented for three choices of the Scaling 
Factor (SF) of the expected counts, three different levels of variability (low, medium and high), and four 
different spatial scenarios (i.e. SCI, SC2, SC3 and SC4). 



Scenarios & Variability Descriptive Statistics 







IVlcctil 




9 Dim 




Q7 Oiia 


O-T — 


— p—j = :r- r-r 

- (J.l, Low Vanao. 












SCI 


{LR = 1.5) 


4.300 


3.061 


1.000 


3.775 


12.144 


SC2 


'LR = 1.5) 


4.312 


3.084 


1.000 


3.825 


11.725 




C — U. ±J 


4.314 


^ 1 89 


1 nnn 




12.219 




( a — n 




^ 9Q1 


1 nnn 


3 ^^n 


12.463 


or — 


— Ill A// Cf/l^oifnrt \/ ftnr'^ ri 

- u.ij ivitmiufft vUii lUiu. 












SCI 


{LR = 2) 


4.311 


3.382 


1.000 


3.625 


12.494 


SC2 


'LR = 2) 


4.325 


3.458 


1.000 


3.275 


13.350 






'i.OO J. 




1 nnn 


^ 3nn 


1 ^ ^44 


sn4. 


( a — o\ 
^fj — yj.o J 






1 nnn 


3 1 nn 


1 3 fifiQ 


or — 


— II 1 r-f 1 rtrt 1 / flT^ n 

- U-ij nLyib vUil lUiU. 












SCI 


{LR = 3) 


4.304 


3.982 


1.000 


3.050 


14.475 


SC2 


' LR = 3) 


4.345 


4.177 


1.000 


3.000 


15.331 




— KJ.OJ 


4.341 




1 nnn 


3 3nn 


1 3 fil Q 




( a — A\ 
(P — u.^j 




4.210 


1 nnn 

J- .uuu 


3 n'ln 




or — 


= 1, Low Vaj'tah. 












SCI 


LR = 1.5) 


42.440 


24.587 


10.294 


36 850 


1 n4 nnn 


SC2 


'LR = 1.5) 


42.440 


25.761 


9.931 


36.125 


104.469 




(rr — n ^^ 


42.440 


9'i fiSI 




3fi 47^ 


1 nfi 3fiQ 




( fi — n o\ 


42.440 


9S "i^Q 


O.O ( o 


ou.uou 


114.831 


or — 


- 1, ivieatuiitr vUiiiau. 












SCI 


LR = 2) 


42.440 


27.544 


10.350 


36 nnn 


1 22 qn6 


SC2 


'LJ? = 2) 


42.440 


30 502 


8.750 


33 225 


122.244 


sr"H 


(•^ _ fl 9^ 

> — U.^ y 


42.440 


■^n 41 8 


7 'H81 


34 f!9^ 


1 1 7 fi^fi 




IP — y>-c>) 


49 44n 




R 81 


ou.ouu 


1 oc: coy 


or — 


- 1, High Variab. 












SCI 


{LR = 3) 


42.440 


34.910 


9.638 


34.100 


175 525 


SC2 


(Lii = 3) 


42.440 


38.826 


6.725 


29.275 


151.644 


SC3 


(cr = 0.3) 


42.440 


31.448 


7.744 


33.500 


123.769 


SC4 


(/J = 0.4) 


42.440 


38.143 


6.081 


32.700 


135.731 


SF = 


= 2, iow Variab. 












SCI 


{LR= 1.5) 


84.886 


48.263 


22.631 


73.575 


200.469 


SC2 


(L7?= 1.5) 


84.886 


51.058 


21.744 


71.700 


212.281 


SC3 


(cr = 0.1) 


84.886 


53.573 


18.131 


72.200 


220.244 


SC4 


(/J = 0.2) 


84.886 


56.077 


17.500 


74.275 


227.494 


SF = 


= 2, Medium Variab. 












SCI 


{LR = 2) 


84.886 


53.721 


22.181 


72.175 


216.163 


SC2 


(L7i = 2) 


84.886 


59.674 


20.056 


66.075 


248.344 


SC3 


(cr = 0.2) 


84.886 


56.546 


18.444 


69.750 


231.025 


SC4 


(/? = 0.3) 


84.886 


65.128 


14.856 


70.825 


250.919 


5F = 


= 2, iJij/i Variab. 












SCI 


{LR = 3) 


84.886 


68.050 


20.881 


69.400 


268.369 


SC2 


(L7? = 3) 


84.886 


77.636 


16.775 


57.625 


316.769 


SC3 


{a = 0.3) 


84.886 


64.130 


15.100 


67.325 


244.894 


SC4 


(/3 = 0.4) 


84.886 


75.035 


12.675 


66.125 


271.725 
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Appendix C 

WinBUGS Codes 



C.l CAR Normal (BYM) Model 

CAR Normal 

# CAR (convolution) Normal. 
####################################### 
model{ 

for (i in 1:N){ 

y[i] ~ dpois (lambda [i] ) 
log(lambda[i] ) <- log(E[i]) + theta[i] 
theta[i] <- alpha + v[i] + u[i] 
RR[i] <- exp(theta[i] ) 
v[i] ~ dnorm(0,tau2_v) 

} 

############### 

# CAR prior: 

u[l:N] ~ car. normal (adj [] .weights [] ,num[] ,tau2_ 
for (j in 1 : sumNumNeigh)-[ 
weights [j] <- 1.0 

} 

############### 

# Hyperpriors. 
alpha ~ dflatO 

# Scaling Parameters. 
tau2_v ~ dgamma (0.5, 0.0005) 
tau2_u ~ dgamma(0. 5,0. 0005) 
sig_v <- sqrt(l/tau2_v) 
sig_u <- sqrt(l/tau2_u) 

}# EoF 
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C.2 CAR Laplace (LI) Model 

CAR Laplace 

# CAR (convolution) LI. 

################################################## 

model{ 

for (i in 1:N){ 

y[i] ~ dpois (lambda [i] ) 
logdambdaEi] ) <- log(E[i]) + theta[i] 
theta[i] <- alpha + v[i] + u[i] 
RR[i] <- exp(theta[i] ) 
v[i] ~ dnorin(0,tau2_v) 

} 

############### 

# CAR prior 

u[l:N] ~ car . iKadj [] .weights [] ,nmn [] ,tau2_u) 
for (j in 1 : siiinNuinNeigh)-[ 
weights [j] <- 1.0 

} 

############### 

# Hyperpriors . 
alpha ~ dflatO 

# Scaling Parameters. 
tau2_v ~ dgamma(0. 5,0. 0005) 
tau2_u ~ dgamma (0.5, 0.0005) 
sig_v <- sqrt (l/tau2_v) 
sig_u <- sqrt(l/tau2_u) 

}# EoF 
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C.3 MRSA Model 



MRSA Model 

# Loglinear Model. 
################################################## 
model{ 

for (i in l:n){ 

y[i] ~ dpois (lambda [i] ) 

log (lambda [i] ) <- log(E[i]) + theta[i] 

theta[i] <- alpha + v[i] 

RR[i] <- exp(theta[i] ) 

v[i] ~ dnorm(0,tau2_v) 

} 

############### 

# Hyperpriors . 

alpha ~ dnorm(0.0,1.0E-6) 

# Scaling Parameters. 
tau2_v ~ dgamma(0. 5, 0.0005) 
sig_v <- sqrt (l/tau2_v) 

}# EoF 
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